Book contents
- Frontmatter
- Dedication
- Contents
- PREFACE
- 1 Introduction
- 2 Workload Data
- 3 Statistical Distributions
- 4 Fitting Distributions to Data
- 5 Heavy Tails
- 6 Correlations in Workloads
- 7 Self-Similarity and Long-Range Dependence
- 8 Hierarchical Generative Models
- 9 Case Studies
- 10 Summary and Outlook
- Appendix Data Sources
- Bibliography
- Index
5 - Heavy Tails
Published online by Cambridge University Press: 05 March 2015
- Frontmatter
- Dedication
- Contents
- PREFACE
- 1 Introduction
- 2 Workload Data
- 3 Statistical Distributions
- 4 Fitting Distributions to Data
- 5 Heavy Tails
- 6 Correlations in Workloads
- 7 Self-Similarity and Long-Range Dependence
- 8 Hierarchical Generative Models
- 9 Case Studies
- 10 Summary and Outlook
- Appendix Data Sources
- Bibliography
- Index
Summary
There are several advanced topics in statistical modeling that are typically not encountered at the introductory level. However, they reflect real-life situations, so they cannot be ignored. Probably the most prominent of them is heavy tails.
In fact, a very common situation is that distributions have many small elements and few large elements. The question is how dominant are the large elements relative to the small ones. In heavy-tailed distributions, the rare large elements (from the tail of the distribution) dominate, leading to a skewed distribution. Examples where this is the case include the following:
▪ The distribution of process runtimes [319].
▪ The distribution of file sizes in a file system or retrieved from a web server [360, 155, 57, 188].
▪ The distribution of popularity of items on a web server and of websites [57, 85, 571, 9, 524].
▪ The distribution of flows in the Internet, both in terms of size and duration [609, 215].
▪ The distribution of think times for interactive users [146].
▪ The distribution of the number of queries sent by a peer on a P2P network and the distribution of session lengths [311].
▪ The distribution of register instance lifetimes (the time until a value in a register is overwritten) [198].
▪ The distribution of in and out degrees in a social network [491].
▪ The distribution of in and out degrees in the Internet topology graph [226, 89].
The qualifier “rare” regarding the tail elements is actually confusing. For example, the most popular page on a web server, the one that dominates the downloads, is popular rather than rare when we look at the list of downloads. But it is rare if we look at the population of web pages: there is only one such page. This duality leads to the effect of mass-count disparity, discussed in Section 5.2.2.
- Type
- Chapter
- Information
- Workload Modeling for Computer Systems Performance Evaluation , pp. 163 - 212Publisher: Cambridge University PressPrint publication year: 2015