Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- 2 A Warm-up
- 3 Random Sampling
- 4 List Ranking
- 5 Sorting Atomic Items
- 6 Set Intersection
- 7 Sorting Strings
- 8 The Dictionary Problem
- 9 Searching Strings by Prefix
- 10 Searching Strings by Substring
- 11 Integer Coding
- 12 Statistical Coding
- 13 Dictionary-Based Compressors
- 14 Block-Sorting Compression
- 15 Compressed Data Structures
- 16 Conclusion
- Index
10 - Searching Strings by Substring
Published online by Cambridge University Press: 08 June 2023
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- 2 A Warm-up
- 3 Random Sampling
- 4 List Ranking
- 5 Sorting Atomic Items
- 6 Set Intersection
- 7 Sorting Strings
- 8 The Dictionary Problem
- 9 Searching Strings by Prefix
- 10 Searching Strings by Substring
- 11 Integer Coding
- 12 Statistical Coding
- 13 Dictionary-Based Compressors
- 14 Block-Sorting Compression
- 15 Compressed Data Structures
- 16 Conclusion
- Index
Summary
This chapter deals with the design of data structures and algorithms for the substring search problem, which occurs mainly in computational biology and textual database applications to date. Most of the chapter is devoted to describing the two main data-structure champions in this context, the suffix array and the suffix tree. Several pseudocodes and illustrative examples enrich this discussion, which is accompanied by the evaluation of time, space, and I/O complexities incurred by their construction and by the execution of some powerful query operations. In particular, the chapter deals with the efficient/optimal construction of large suffix arrays in external memory, hence describing the DC3 algorithm and the I/O-efficient scan-based algorithm proposed by Gonnet, Baeza-Yates, and Snider, and the efficient direct construction of suffix trees, via McCreight’s algorithm, or via suffix arrays and LCP arrays. It will also detail the elegant construction of this latter array in internal memory, which is fundamental for several text-mining applications, some of which are described at the end of the chapter.
- Type
- Chapter
- Information
- Pearls of Algorithm Engineering , pp. 153 - 193Publisher: Cambridge University PressPrint publication year: 2023