Book contents
Appendix A
Published online by Cambridge University Press: 07 January 2010
Summary
Outline
This is an artificial data set that was created with a known correlation structure. There are 5 variables and 30 cases. The variables fall into two related (correlated groups), hence the effective dimensions of these data are 30 by 2.
The data
Descriptive statistics and relationships between variables
There are two obvious points to draw from these simple summary statistics. Firstly, V2 has a much larger mean than any of the other variables. Secondly, although V4 has a relatively large mean, its standard deviation is small, since most of the other variables have standard deviations that are about 50% of the mean.
There are two major groups of correlated predictors in Figure V2 and Table V4. The first three variables, V1–V3, are quite highly inter-correlated, particularly V2 and V3. V4 and V5 are also correlated. The only other significant relationship is the weak, negative correlation between V1 and V5.
- Type
- Chapter
- Information
- Cluster and Classification Techniques for the Biosciences , pp. 200 - 202Publisher: Cambridge University PressPrint publication year: 2006