Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- 2 A Warm-up
- 3 Random Sampling
- 4 List Ranking
- 5 Sorting Atomic Items
- 6 Set Intersection
- 7 Sorting Strings
- 8 The Dictionary Problem
- 9 Searching Strings by Prefix
- 10 Searching Strings by Substring
- 11 Integer Coding
- 12 Statistical Coding
- 13 Dictionary-Based Compressors
- 14 Block-Sorting Compression
- 15 Compressed Data Structures
- 16 Conclusion
- Index
5 - Sorting Atomic Items
Published online by Cambridge University Press: 08 June 2023
- Frontmatter
- Dedication
- Contents
- Preface
- 1 Introduction
- 2 A Warm-up
- 3 Random Sampling
- 4 List Ranking
- 5 Sorting Atomic Items
- 6 Set Intersection
- 7 Sorting Strings
- 8 The Dictionary Problem
- 9 Searching Strings by Prefix
- 10 Searching Strings by Substring
- 11 Integer Coding
- 12 Statistical Coding
- 13 Dictionary-Based Compressors
- 14 Block-Sorting Compression
- 15 Compressed Data Structures
- 16 Conclusion
- Index
Summary
This chapter revisits the classic sorting problem within the context of big inputs, where “Atomic” in the title refers to the fact that items occupy few memory words and are managed in their entirety by executing only comparisons. It discusses two classic sorting paradigms: the merge-based paradigm, which underlies the design of MergeSort, and the distribution-based paradigm, which underlies the design of QuickSort. It shows how to adapt them to work in a hierarchical memory setting, analyzes their I/O complexity, and finally proposes some useful algorithmic tools that allow us to speed up their execution in practice, such as the Snow-Plow technique and data compression. It also proves that these adaptations are I/O optimal in the two-level memory model by providing a sophisticated, yet very informative, lower bound.These results allow us to relate the sorting problem to the so-called permuting problem, typically neglected when dealing with sorting in the RAM model, and then argue an interesting I/O-complexity equivalence between these two problems which provides a mathematical ground for the ubiquitous use of sorters when designing I/O-efficient solutions for big data problems.
- Type
- Chapter
- Information
- Pearls of Algorithm Engineering , pp. 44 - 71Publisher: Cambridge University PressPrint publication year: 2023