Book contents
- Frontmatter
- Dedication
- Contents
- PREFACE
- 1 Introduction
- 2 Workload Data
- 3 Statistical Distributions
- 4 Fitting Distributions to Data
- 5 Heavy Tails
- 6 Correlations in Workloads
- 7 Self-Similarity and Long-Range Dependence
- 8 Hierarchical Generative Models
- 9 Case Studies
- 10 Summary and Outlook
- Appendix Data Sources
- Bibliography
- Index
4 - Fitting Distributions to Data
Published online by Cambridge University Press: 05 March 2015
- Frontmatter
- Dedication
- Contents
- PREFACE
- 1 Introduction
- 2 Workload Data
- 3 Statistical Distributions
- 4 Fitting Distributions to Data
- 5 Heavy Tails
- 6 Correlations in Workloads
- 7 Self-Similarity and Long-Range Dependence
- 8 Hierarchical Generative Models
- 9 Case Studies
- 10 Summary and Outlook
- Appendix Data Sources
- Bibliography
- Index
Summary
This chapter considers the problem of finding a distribution that fits given data. The data have a so-called empirical distribution – a list of all the observed values and how many times each one of them has occurred. The goal is to find a distribution function that is a good match to this data, meaning that if we sample it we will get a list of values similar to the observed list. Note, however, that we never expect to get a perfect match. One reason is that randomness is at work – two different sets of samples from the same distribution will nevertheless be different. More importantly, there is no reason to believe that the original data was indeed sampled from our model distribution. The running times of Unix processes are not samples from an exponential distribution, or a Pareto distribution, or any other distribution function that has a nice mathematical formulation.
But we can hope to find a distribution function that is a close enough match. The real meaning of “close enough” is that it will produce reliable results if used in a performance evaluation study. Because this is impossible to assess in practice, we settle for statistical definitions. For example, we may require that the distribution's moments be close to those of the data or that its shape be close to that of the empirical distribution. Thus this entire chapter is concerned with the basic methods of descriptive modeling.
To read more: Although we cover the basics of fitting distributions here, there is much more to this subject. An excellent reference, including the description of many different distributions, is the book by Law and Kelton [426]. It has the advantage of placing the discussion in the context of modeling and simulation. There are many statistics texts that discuss fitting distributions per se (e.g., DeGroot [169] and Montgomery and Runger [500]).
- Type
- Chapter
- Information
- Workload Modeling for Computer Systems Performance Evaluation , pp. 130 - 162Publisher: Cambridge University PressPrint publication year: 2015