Book contents
- Frontmatter
- Contents
- List of figures
- List of tables
- Acknowledgments
- Part I Background concepts and issues
- Part II Methodology
- Part III Dimensions and relations in English
- Appendix I Texts used in the study
- Appendix II Linguistic features: algorithms and functions
- Appendix III Mean frequency counts of all linguistic features in each genre
- Appendix IV Pearson correlation coefficients for all linguistic features
- References
- Index
Appendix II - Linguistic features: algorithms and functions
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- List of figures
- List of tables
- Acknowledgments
- Part I Background concepts and issues
- Part II Methodology
- Part III Dimensions and relations in English
- Appendix I Texts used in the study
- Appendix II Linguistic features: algorithms and functions
- Appendix III Mean frequency counts of all linguistic features in each genre
- Appendix IV Pearson correlation coefficients for all linguistic features
- References
- Index
Summary
Development of computer programs for grammatical analysis
One of the distinctive characteristics of the present study is inclusion of a large number of linguistic features representing the range of functional possibilities in English. Further, these features are counted in a large number of texts and genres, to exclude idiosyncratic variation and to insure inclusion of the range of situational and linguistic variation existing within speaking and writing in English.
The use of computerized text corpora and computer programs for the automatic identification of linguistic features made it possible to carry out a study of this scope. The programs, which are written in PL/1, use the untagged versions of the LOB and London–Lund corpora as input. In a tagged corpus, such as the Brown corpus, the words in a text are all marked, or ‘tagged’, for their grammatical category, greatly facilitating automatic syntactic analysis. A tagged version of the LOB corpus became available during the course of the present study, but it was not used because there is no comparable version of the London–Lund corpus (the spoken texts). That is, programs that took advantage of the grammatical tagging in the LOB corpus would identify features with a greater accuracy than could be identified in the London–Lund corpus, thus skewing the comparison of spoken and written genres. Therefore, the untagged versions of both corpora were used, and a single set of programs was developed for the analysis of both.
- Type
- Chapter
- Information
- Variation across Speech and Writing , pp. 211 - 245Publisher: Cambridge University PressPrint publication year: 1988
- 1
- Cited by