Book contents
- Frontmatter
- Contents
- List of figures
- List of tables
- Acknowledgments
- Part I Background concepts and issues
- Part II Methodology
- Part III Dimensions and relations in English
- Appendix I Texts used in the study
- Appendix II Linguistic features: algorithms and functions
- Appendix III Mean frequency counts of all linguistic features in each genre
- Appendix IV Pearson correlation coefficients for all linguistic features
- References
- Index
Appendix I - Texts used in the study
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- List of figures
- List of tables
- Acknowledgments
- Part I Background concepts and issues
- Part II Methodology
- Part III Dimensions and relations in English
- Appendix I Texts used in the study
- Appendix II Linguistic features: algorithms and functions
- Appendix III Mean frequency counts of all linguistic features in each genre
- Appendix IV Pearson correlation coefficients for all linguistic features
- References
- Index
Summary
As noted in Chapter 4, not all texts from the London–Lund and LOB corpora were included in the study, because of the time involved in editing the tagged texts. All genres included in the corpora, however, are represented in the study.
In addition, many of the text samples in the London–Lund corpus were divided. Texts were divided for one of two reasons. The first is that many of these texts, which are 5,000 words long, actually comprise two or more shorter texts. For example, a typical telephone ‘text’ consists of several conversations which are juxtaposed so that the total number of words in the text sample exceeds 5,000. In these cases, each conversation (or speech, broadcast, etc.) was separated and treated as a distinct text. If a text thus separated was shorter than 400 words, it was excluded from the analysis. (For this same reason, several of the letters that had been collected were excluded.)
Text samples that did not consist of several different texts were divided to obtain two samples of approximately 2,500 (continuous) words each. For this reason, these are not ‘texts’ in the sense that they are not bounded and do not contain all of the structural (textual) properties of a text. Many of the 2,000-word samples in the LOB corpus are of this type also; the text samples do not represent entire books, articles, or even chapters, and so do not represent entire ‘texts’.
- Type
- Chapter
- Information
- Variation across Speech and Writing , pp. 208 - 210Publisher: Cambridge University PressPrint publication year: 1988
- 1
- Cited by