Regression analysis

Norman T. J. Bailey

doi:10.1017/CBO9781139170840.011

THE BASIC IDEA OF REGRESSION

In the previous chapter we saw how the association between two measurements, such as the statures of father–son pairs or the heights and weights of a series of individuals, could be calculated in terms of the correlation coefficient. For this procedure to be reasonably satisfactory it was necessary that the two measurements followed a bivariate normal distribution, at least approximately. With heights and weights, for instance, and many other physical measurements, the assumption is probably not too far from the truth. In general, if we select individuals at random and make measurements on them, or alternatively select some randomly chosen unit such as a family, then the pairs of readings certainly have a bivariate distribution of some kind. With luck it will also be approximately bivariate normal. If, however, we specially select individuals on the basis of one measurement and afterwards record the other, the first measurement properly speaking has no distribution: it may have been decided quite arbitrarily.

Thus, the data in Table 14 are quite suitable for the correlation type of analysis: a family is randomly chosen, and the father and one son (also randomly chosen) are measured. Suppose now that we were specially interested in the sons of very tall and very short fathers. We might well consider that Table 14 contained insufficient material about these extremes, and might decide to swell the sample by an extra 100 fathers of about 1.58 m in height and another 100 of about 1.84 m.

Book contents

10 - Regression analysis

Summary

Access options

Book contents

10 - Regression analysis

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive