Book contents
- Frontmatter
- Dedication
- Contents
- Foreword
- Preface
- Acknowledgments
- About the sketches
- I ANALYSIS
- Chapter 1 Probabilistic Models
- Chapter 2 Exact String Matching
- Chapter 3 Constrained Exact String Matching
- Chapter 4 Generalized String Matching
- Chapter 5 Subsequence String Matching
- II APPLICATIONS
- Bibliography
- Index
Chapter 4 - Generalized String Matching
from I - ANALYSIS
Published online by Cambridge University Press: 05 July 2015
- Frontmatter
- Dedication
- Contents
- Foreword
- Preface
- Acknowledgments
- About the sketches
- I ANALYSIS
- Chapter 1 Probabilistic Models
- Chapter 2 Exact String Matching
- Chapter 3 Constrained Exact String Matching
- Chapter 4 Generalized String Matching
- Chapter 5 Subsequence String Matching
- II APPLICATIONS
- Bibliography
- Index
Summary
In this chapter we consider generalized pattern matching, in which a set of patterns (rather than a single pattern) is given. We assume here that the pattern is a pair of sets of words (W0, W), where Wi consists of the sets Wi ⊂ Ami (i.e., all words in Wi have a fixed length mi). The set W0 is called the forbidden set. For W0 = ∅ one is interested in the number of pattern occurrences On(W), defined as the number of patterns from W occurring in a text generated by a (random) source. Another parameter of interest is the number of positions in where a pattern from W appears (clearly, several patterns may occur at the same positions but words from Wi must occur in different locations); this quantity we denote as Πn. If we define as the number of positions where a word from Wi occurs, then
Notice that at any given position of the text and for a given i only one word from Wi can occur.
For W0 ≠ ∅ one studies the number of occurrences On(W) under the condition that, that is, there is no occurrence of a pattern from W0 in the text. This could be called constrained pattern matching since one restricts the text to those strings that do not contain strings from W0. A simple version of constrained pattern matching was discussed in Chapter 3 (see also Exercises 3.3, 3.6, and 3.10).
In this chapter we first present an analysis of generalized pattern matching with W0 = ∅ and d = 1, which we call the reduced pattern set (i.e., no pattern is a substring of another pattern).
- Type
- Chapter
- Information
- Analytic Pattern MatchingFrom DNA to Twitter, pp. 75 - 108Publisher: Cambridge University PressPrint publication year: 2015