Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- 1 The Data Sets
- 2 The Model-Building Process
- 3 Constant Variance Response Models
- 4 Nonconstant Variance Response Models
- 5 Discrete, Categorical Response Models
- 6 Count Response Models
- 7 Time-to-Event Response Models
- 8 Longitudinal Response Models
- 9 Structural Equation Modeling
- 10 Matching Data to Models
- Bibliography
- Index
4 - Nonconstant Variance Response Models
Published online by Cambridge University Press: 03 August 2017
- Frontmatter
- Dedication
- Contents
- Preface
- 1 The Data Sets
- 2 The Model-Building Process
- 3 Constant Variance Response Models
- 4 Nonconstant Variance Response Models
- 5 Discrete, Categorical Response Models
- 6 Count Response Models
- 7 Time-to-Event Response Models
- 8 Longitudinal Response Models
- 9 Structural Equation Modeling
- 10 Matching Data to Models
- Bibliography
- Index
Summary
Heterogeneity in Response Variance
One of the assumptions of the normal regression model is the homogeneity of variance. That is, it is assumed that each response represents an outcome from a (normal) distribution with the same, constant variation, regardless of the values of associated predictors. However, this property of homoscedasticity is often violated by responses that show fluctuations in variation across values of predictors as was seen in Chapter 3 using the four data sets. In such cases researchers will still be interested in evaluating all of the same questions that can be answered using normal regression models, but must turn to alternative modeling techniques.
For example, education researchers may collect records of standardized test scores for students in multiple classrooms with an interest in comparing performance across different student demographics. However, in analyzing the data it may be seen that students in classes with more experienced instructors show more consistency in their performances, and therefore lower variation in scores than for students in classes with less experienced instructors. As another example, health policy researchers might be interested in connections between family income level and health and medical expenditures. When data are collected, data analysts may see that the health spending varies greatly for families with significant financial resources, but health spending is consistent for families less financial resources. In both examples the variability of the outcome of interest changes depending on different subject characteristics.
Within any normal regression model, inferential methods such as t-tests and F-tests are constructed under the assumption of homoscedasticity. Using this assumption, “pooled” estimates of variation are made using the data, including the random unexplained variation captured by the mean-squared error (MSE). If such inferential methods are applied to heteroscedastic data, the pooled estimates such as MSE can either underestimate or overestimate the true variation in the response, and therefore hypothesis-testing procedures will be unreliable, often in unpredictable ways. Therefore it is important for data analysts to be able to detect and correct for violations of the constant-variance assumption.
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2017