Book contents
- Frontmatter
- Contents
- Preface
- Chapter 1 Algorithms on Words
- Chapter 2 Structures for Indexes
- Chapter 3 Symbolic Natural Language Processing
- Chapter 4 Statistical Natural Language Processing
- Chapter 5 Inference of Network Expressions
- Chapter 6 Statistics on Words with Applications to Biological Sequences
- Chapter 7 Analytic Approach to Pattern Matching
- Chapter 8 Periodic Structures in Words
- Chapter 9 Counting, Coding, and Sampling with Words
- Chapter 10 Words in Number Theory
- References
- General Index
Chapter 3 - Symbolic Natural Language Processing
Published online by Cambridge University Press: 05 June 2013
- Frontmatter
- Contents
- Preface
- Chapter 1 Algorithms on Words
- Chapter 2 Structures for Indexes
- Chapter 3 Symbolic Natural Language Processing
- Chapter 4 Statistical Natural Language Processing
- Chapter 5 Inference of Network Expressions
- Chapter 6 Statistics on Words with Applications to Biological Sequences
- Chapter 7 Analytic Approach to Pattern Matching
- Chapter 8 Periodic Structures in Words
- Chapter 9 Counting, Coding, and Sampling with Words
- Chapter 10 Words in Number Theory
- References
- General Index
Summary
Introduction
Fundamental notions of combinatorics on words underlie natural language processing. This is not surprising, since combinatorics on words can be seen as the formal study of sets of strings, and sets of strings are fundamental objects in language processing.
Indeed, language processing is obviously a matter of strings. A text or a discourse is a sequence of sentences; a sentence is a sequence of words; a word is a sequence of letters. The most universal levels are those of sentence, word, and letter (or phoneme), but intermediate levels exist, and can be crucial in some languages, between word and letter: a level of morphological elements (e.g. suffixes), and the level of syllables. The discovery of this piling up of levels, and in particular of word level and phoneme level, delighted structuralist linguists in the twentieth century. They termed this inherent, universal feature of human language “double articulation”.
It is a little more intricate to see how sets of strings are involved. There are two main reasons. First, at a point in a linguistic flow of data being processed, you must be able to predict the set of possible continuations after what is already known, or at least to expect any continuation among some set of strings that depends on the language. Second, natural languages are ambiguous, that is a written or spoken portion of text can often be understood or analysed in several ways, and the analyses are handled as a set of strings as long as they cannot be reduced to a single analysis.
- Type
- Chapter
- Information
- Applied Combinatorics on Words , pp. 164 - 209Publisher: Cambridge University PressPrint publication year: 2005
- 1
- Cited by