Characteristics of tagged corpora

Douglas Biber; Susan Conrad; Randi Reppen

doi:10.1017/CBO9780511804489.015

4 - Characteristics of tagged corpora

Published online by Cambridge University Press: 05 June 2012

Douglas Biber ,

Susan Conrad and

Randi Reppen

Show author details

Douglas Biber: Affiliation:
Northern Arizona University
Susan Conrad: Affiliation:
Iowa State University
Randi Reppen: Affiliation:
Northern Arizona University

Book contents

Get access

Summary

The main use of an uncoded corpus is searching for a particular word or sequence of words. Concordancing software is designed especially to allow searches of this type, for example, to examine frequencies of words or collocations, or to find examples of certain words or structures. However, many linguistic investigations – including most of the analyses in this book – are not possible if we are restricted to simply searching for words. Even structures that conform to fairly regular morphological or syntactic patterns are not easy to study based on word searches.

Suppose, for example, that you wanted to investigate the use of passive voice. With an uncoded corpus, you might start by searching for any form of be plus a word ending in -en. This search pattern would find many passives – such as was eaten and is taken – but it would miss the passives ending in -ed (e.g., been carried, was kicked). Even if you expanded your search to include these passives, you would still miss all the irregular passives, of which there are many – e.g., shown, torn, built, kept, sold, meant, brought. Each ends with a different sequence of letters, making patterned searches impossible. In addition, you would miss instances that have intervening adverbs (e.g., was completely eaten). Conversely, this search pattern would fit structures that are not passives, such as was green, is red.

Type: Chapter
Information: Corpus Linguistics
Investigating Language Structure and Use
, pp. 257 - 260

DOI: https://doi.org/10.1017/CBO9780511804489.015 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 1998

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

4 - Characteristics of tagged corpora

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive