Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-11T14:43:00.484Z Has data issue: false hasContentIssue false

Commentary on Coefficient Alpha: A Cautionary Tale

Published online by Cambridge University Press:  01 January 2025

Samuel B. Green*
Affiliation:
Arizona State University
Yanyun Yang
Affiliation:
Florida State University
*
Requests for reprints should be sent to Samuel B. Green, Arizona State University, P.O. Box 870611, Tempe, AZ 85287-0611, USA. E-mail: samgreen@asu.edu

Abstract

The general use of coefficient alpha to assess reliability should be discouraged on a number of grounds. The assumptions underlying coefficient alpha are unlikely to hold in practice, and violation of these assumptions can result in nontrivial negative or positive bias. Structural equation modeling was discussed as an informative process both to assess the assumptions underlying coefficient alpha and to estimate reliability

Type
Theory and Methods
Copyright
Copyright © 2008 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Becker, G. (2000). How important is transient error in estimating reliability? Going beyond simulation studies. Psychological Methods, 5, 370379.Google Scholar
Bentler, P.M., Woodward, J.A. (1980). Inequalities among lower bounds to reliability: With applications to test construction and factor analysis. Psychometrika, 45, 249267.Google Scholar
Bollen, K.A. (1989). Structural equations with latent variables, New York: Wiley.Google Scholar
Cattell, R.B., Tsujioka, B. (1964). The importance of factor-trueness and validity, versus homogeneity and orthogonality in test scales. Educational and Psychological Measurement, 24, 330.Google Scholar
Chen, F.F., West, S.G., Sousa, K.H. (2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41, 189224.Google Scholar
Cortina, J.M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98104.Google Scholar
Crocker, L., Algina, J. (1986). Introduction to classical and modern test theory, New York: Holt, Rinehart, and Winston.Google Scholar
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.Google Scholar
Feldt, L.S., Qualls, A.L. (1996). Bias in coefficient alpha arising from heterogeneity of test content. Applied Measurement in Education, 9, 277286.Google Scholar
Fleishman, J., Benson, J. (1987). Using LISREL to evaluate measurement models and scale reliability. Educational and Psychological Measurement, 47, 925939.Google Scholar
Gerbing, D.W., Anderson, J.C. (1988). An updated paradigm for scale development incorporating unidimensionality and its assessment. Journal of Marketing Research, 25, 186192.Google Scholar
Gessaroli, M.E., Folske, J.C. (2002). Generalizing the reliability of tests comprised of testlets. International Journal of Testing, 2, 277295.Google Scholar
Green, S.B. (2003). A coefficient alpha for test-retest data. Psychological Methods, 8, 88101.Google Scholar
Green, S.B., Hershberger, S.L. (2000). Correlated errors in true score models and their effect on coefficient alpha. Structural Equation Modeling, 7, 251270.Google Scholar
Green, S.B., & Yang, Y. (2009). Reliability of summed item scores using structural equation modeling: An alternative to coefficient alpha. Psychometrika, 94. doi: 10.1007/s11336-008-9099-3.Google Scholar
Green, S.B., Lissitz, R.W., Mulaik, S.A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and Psychological Measurement, 37, 827838.Google Scholar
Green, S.B., Akey, T.M., Fleming, K.K., Hershberger, S.L., Marquis, J.G. (1997). Effect of the number of scale points on chi-square fit indices in confirmatory factor analysis. Structural Equation Modeling, 4, 108120.Google Scholar
Guttman, L.A. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255282.Google Scholar
Hattie, J. (1985). Methodology review: Assessing unidimensionality of test and items. Applied Psychological Measurement, 9, 139164.Google Scholar
Horn, J.L. (1965). A rationale and a test for the number of factors in factor analysis. Psychometrika, 30, 179185.Google Scholar
Humphreys, L.G. (1985). General intelligence: An integration of factor, test, and simplex theory. In Wolman, B.B. (Eds.), Handbook of intelligence: Theories, measurements, and applications (pp. 1535). New York: Wiley.Google Scholar
Jackson, P.H., Agunwamba, C.C. (1977). Lower bounds for the reliability of the total score on a test composed of non-homogeneous items: I. Algebraic lower bounds. Psychometrika, 42, 567578.Google Scholar
Jöreskog, K.G. (1971). Statistical analysis of sets of congeneric test. Psychometrika, 36, 109133.Google Scholar
Leary, L.F., Dorans, N.J. (1985). Implications for altering the context in which test items appear: A historical perspective on an immediate concern. Review of Educational Research, 55, 387411.Google Scholar
Lee, G., Frisbie, D.A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12, 237255.Google Scholar
Lee, G., Dunbar, S.B., Frisbie, D.A. (2001). The relative appropriateness of eight measurement models for analyzing scores from tests composed of testlets. Educational and Psychological Measurement, 61, 958975.Google Scholar
Lord, F.M., Novick, M.R. (1968). Statistical theories of mental test scores, Reading: Addison-Wesley.Google Scholar
Lucke, J.F. (2005). “Rassling the hog” The influence of correlated item error on internal consistency, classical reliability, and congeneric reliability. Applied Psychological Measurement, pp. 106–125.Google Scholar
Maxwell, A.E. (1968). The effect of correlated errors on estimates of reliability coefficients. Educational and Psychological Measurement, 28, 803811.Google Scholar
McDonald, R.P. (1981). The dimensionality of test and items. British Journal of Mathematical and Statistical Psychology, 34, 100117.Google Scholar
McDonald, R.P. (1999). Test theory: A unified approach, Hillsdale: Erlbaum.Google Scholar
Miller, M.B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, 2, 255273.Google Scholar
Novick, M.R., Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32, 113.Google Scholar
Ochieng, C.O. (2001). Effects of item order on consistency and precision under different ordering schemes in attitudinal scales: A case of physical self-concept scales (Paper No. ESQESS-2001-3). University of British Columbia. Edgeworth Laboratory for Quantitative Educational and Social Science, Vancouver, B.C.Google Scholar
Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173184.Google Scholar
Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated nonhomogeneous items. Applied Psychological Measurement, 22, 375385.Google Scholar
Raykov, T. (2001). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied Psychological Measurement, 25, 6976.Google Scholar
Raykov, T., Shrout, P. (2002). Reliability of scales with general structure: Point and interval estimation using a structural equation modeling approach. Structural Equation Modeling, 9, 195212.Google Scholar
Reise, S.P., Waller, N.G., Comrey, A.L. (2000). Factor analysis and scale revision. Psychological Assessment, 12, 287297.Google Scholar
Reise, S.P., Morizot, J., Hays, R.D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16, 1931.Google Scholar
Rindskopf, D., Rose, T. (1988). Some theory and applications of confirmatory second-order factor analysis. Multivariate Behavioral Research, 23, 5167.Google Scholar
Rozeboom, W.W. (1966). Foundations of the theory of prediction, Homewood: Dorsey.Google Scholar
Rozeboom, W.W. (1989). The reliability of a linear composite of nonequivalent subtests. Applied Psychological Measurement, 13, 277283.Google Scholar
Roznowski, M., Tucker, L.R., Humphreys, L.G. (1991). Three approaches to determining the dimensionality of binary items. Applied Psychological Measurement, 15, 109127.Google Scholar
Schmid, J., Leiman, J.M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 5361.Google Scholar
Schurr, K.T., Henriksen, L.W. (1983). Effects of item sequencing and grouping in low-inference type questionnaires. Journal of Educational Measurement, 20, 379391.Google Scholar
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 94. doi: 10.1007/s11336-008-9101-0.Google Scholar
Sireci, S.G., Thissen, D., Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237247.Google Scholar
Sparfeldt, J.E., Schilling, S.R., Rost, D.H. (2006). Blocked versus randomized format of questionnaires: A confirmatory. Educational and Psychological Measurement, 66, 961974.Google Scholar
Steinberg, L. (2001). The consequences of pairing questions: Context effects in personality measurement. Journal of Personality and Social Psychology, 81, 332342.Google Scholar
Steinberg, L., Thissen, D. (1996). Uses of item response theory and the testlet concept in the measurement of psychopathology. Psychological Methods, 1, 8197.Google Scholar
Ten Berge, J.M.F., Kiers, H.A.L. (1991). A numerical approach to the exact and the approximate minimum rank of a covariance matrix. Psychometrika, 56, 309315.Google Scholar
Ten Berge, J.M.F., & Kiers, H.A.L. (2003). The minimum rank factor analysis program MRFA. Internal report, Department of Psychology, University of Groningen, The Netherlands.Google Scholar
Veres, J.G., Sims, R.R., Locklear, T.S. (1991). Improving the reliability of Kolb’s revised learning style inventory. Educational & Psychological Measurement, 51, 143150.Google Scholar
Wainer, H., Kiely, G.L. (1987). Item clusters and computerized adaptive testing: A case of testlets. Journal of Educational Measurement, 24, 185201.Google Scholar
Woodhouse, B., Jackson, E.H. (1977). Lower bounds for the reliability of a test composed of nonhomogeneous items II: A search procedure to locate the greatest lower bound. Psychometrika, 42, 579591.Google Scholar
Yang, Y., & Green, S.B. (2007). Coefficient alpha and SEM estimates of reliability. Presented at annual meeting of the American Educational Research Association.Google Scholar
Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8, 125145.Google Scholar
Yen, W.M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187214.Google Scholar
Yung, Y.F., Thissen, D., McLeod, L.D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64, 113128.Google Scholar
Zimmerman, D.W., Zumbo, R.D., Lalonde, C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53, 3349.Google Scholar
Zinbarg, R.E., Revelle, W., Yovel, I., Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ω H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70, 123133.Google Scholar
Zinbarg, R.E., Revelle, W., Yovel, I. (2007). Estimating ω h for structures containing two group factors: Perils and prospects. Applied Psychological Measurement, 15, 135157.Google Scholar
Zumbo, B.D., Rupp, A.A. (2004). Responsible modeling of measurement data for appropriate inferences: Important advances in reliability and validity theory. In Kaplan, D. (Eds.), The SAGE handbook of quantitative methodology for the social sciences (pp. 7392). Thousand Oaks: Sage.Google Scholar
Zwick, W.R., Velicer, W.F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99, 432442.Google Scholar