Linguistic features: algorithms and functions

Douglas Biber

doi:10.1017/CBO9780511621024.011

Development of computer programs for grammatical analysis

One of the distinctive characteristics of the present study is inclusion of a large number of linguistic features representing the range of functional possibilities in English. Further, these features are counted in a large number of texts and genres, to exclude idiosyncratic variation and to insure inclusion of the range of situational and linguistic variation existing within speaking and writing in English.

The use of computerized text corpora and computer programs for the automatic identification of linguistic features made it possible to carry out a study of this scope. The programs, which are written in PL/1, use the untagged versions of the LOB and London–Lund corpora as input. In a tagged corpus, such as the Brown corpus, the words in a text are all marked, or ‘tagged’, for their grammatical category, greatly facilitating automatic syntactic analysis. A tagged version of the LOB corpus became available during the course of the present study, but it was not used because there is no comparable version of the London–Lund corpus (the spoken texts). That is, programs that took advantage of the grammatical tagging in the LOB corpus would identify features with a greater accuracy than could be identified in the London–Lund corpus, thus skewing the comparison of spoken and written genres. Therefore, the untagged versions of both corpora were used, and a single set of programs was developed for the analysis of both.

Book contents

Appendix II - Linguistic features: algorithms and functions

Summary

Access options

Book contents

Appendix II - Linguistic features: algorithms and functions

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive