Book contents
- Frontmatter
- Dedication
- Contents
- PREFACE
- 1 Introduction
- 2 Workload Data
- 3 Statistical Distributions
- 4 Fitting Distributions to Data
- 5 Heavy Tails
- 6 Correlations in Workloads
- 7 Self-Similarity and Long-Range Dependence
- 8 Hierarchical Generative Models
- 9 Case Studies
- 10 Summary and Outlook
- Appendix Data Sources
- Bibliography
- Index
8 - Hierarchical Generative Models
Published online by Cambridge University Press: 05 March 2015
- Frontmatter
- Dedication
- Contents
- PREFACE
- 1 Introduction
- 2 Workload Data
- 3 Statistical Distributions
- 4 Fitting Distributions to Data
- 5 Heavy Tails
- 6 Correlations in Workloads
- 7 Self-Similarity and Long-Range Dependence
- 8 Hierarchical Generative Models
- 9 Case Studies
- 10 Summary and Outlook
- Appendix Data Sources
- Bibliography
- Index
Summary
The discovery of self-similarity and the quest to model it required a radical departure from previous practices. Before this discovery, workload items were considered to be independent of each other. Modeling could then be done by sampling workload items from the relevant distributions, subject to desired correlations between workload attributes. But self-similar workloads have an internal structure. Workload items are no longer distributed uniformly along time – they come in bursts, in many different time scales.
The next step is to consider whether it is only the arrivals that are bursty. Self-similarity says that workload items come in bursts, but says nothing about the nature of the items in a burst. Do the attributes of the burst items have any special characteristics? For example, do job sizes also come in “bursts,” or are they just randomly selected from a distribution? And what about runtimes, memory usage, and so on?
The answer to these questions is that all workload attributes tend to come in bursts. It is common to see a burst of jobs with similar attributes, and then a burst of jobs with other attributes, and so on [239, 109]. Each burst is characterized by rather modal distributions, often concentrated around a single value. The wider distributions describing the entire workload are actually a combination of the modal distributions of the different bursts of activity. Thus the collective distribution does not capture the workload dynamics unless we postulate locality of sampling: instead of sampling from the whole distribution, sample first from one part, then from another, and so on.
A good way to produce locality of sampling is by using a hierarchical model: first select the part of the distribution on which to focus, and then sample from this selected region. This chapter suggests that such a model can and should be generative. This means that it should mimic and model the processes that create the real workload. Such models are therefore called hierarchical generative models (HGM).
- Type
- Chapter
- Information
- Workload Modeling for Computer Systems Performance Evaluation , pp. 357 - 398Publisher: Cambridge University PressPrint publication year: 2015