Texts used in the study

Douglas Biber

doi:10.1017/CBO9780511621024.010

As noted in Chapter 4, not all texts from the London–Lund and LOB corpora were included in the study, because of the time involved in editing the tagged texts. All genres included in the corpora, however, are represented in the study.

In addition, many of the text samples in the London–Lund corpus were divided. Texts were divided for one of two reasons. The first is that many of these texts, which are 5,000 words long, actually comprise two or more shorter texts. For example, a typical telephone ‘text’ consists of several conversations which are juxtaposed so that the total number of words in the text sample exceeds 5,000. In these cases, each conversation (or speech, broadcast, etc.) was separated and treated as a distinct text. If a text thus separated was shorter than 400 words, it was excluded from the analysis. (For this same reason, several of the letters that had been collected were excluded.)

Text samples that did not consist of several different texts were divided to obtain two samples of approximately 2,500 (continuous) words each. For this reason, these are not ‘texts’ in the sense that they are not bounded and do not contain all of the structural (textual) properties of a text. Many of the 2,000-word samples in the LOB corpus are of this type also; the text samples do not represent entire books, articles, or even chapters, and so do not represent entire ‘texts’.

Book contents

Appendix I - Texts used in the study

Summary

Access options

Book contents

Appendix I - Texts used in the study

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive