Skip to main content Accessibility help
×
Hostname: page-component-5c6d5d7d68-lvtdw Total loading time: 0 Render date: 2024-08-17T01:57:57.486Z Has data issue: false hasContentIssue false

2 - Reliability and measurement error

Published online by Cambridge University Press:  04 August 2010

Thomas B. Newman
Affiliation:
University of California, San Francisco
Michael A. Kohn
Affiliation:
University of California, San Francisco
Get access

Summary

Introduction

A test cannot be useful unless it gives the same or similar results when administered repeatedly to the same individual within a time period too short for real biological changes to take place. Consistency must be maintained whether the test is repeated by the same measurer or by different measurers. This desirable characteristic of a test is generally called “reliability,” although some authors prefer “reproducibility.” In this chapter, we will look at several different ways to quantify reliability of a test. Measures of reliability depend on whether the test is being administered repeatedly by the same observer, or by different people or different methods, as well as what type of variable is being measured. Intra-rater reliability compares results when the test is administered repeatedly by the same observer, and inter-rater reliability compares the results when measurements are made by different observers. Standard deviation and coefficient of variation are used to determine reliability between multiple measurements of a continuous variable in the same individual. These differences can be random or systematic. We usually assume that differences between repeated measurements by the same observer and method are purely random, whereas differences between measurements by different observers or by different methods can be both random and systematic. The term “bias” refers to systematic differences, distinguishing them from “random error.” The Bland–Altman plot describes reliability in method comparison, in which one measurement method (often established, but invasive, harmful, or expensive) is compared with another method (often newer, easier, or cheaper).

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altman, D. G. (1991). Practical Statistics for Medical Research. London, Chapman and Hall.Google Scholar
Bland, J. M., and Altman, D. G. (1986). “Statistical methods for assessing agreement between two methods of clinical measurement.” Lancet 1(8476): 307–10.CrossRefGoogle ScholarPubMed
Bland, J. M., and Altman, D. G. (1996a). “Measurement error.” Br Med J 313(7059): 744.CrossRefGoogle ScholarPubMed
Bland, J. M., and Altman, D. G. (1996b). “Measurement error and correlation coefficients.” Br Med J 313(7048): 41–2.CrossRefGoogle ScholarPubMed
Bland, J. M., and Altman, D. G. (1999). “Measuring agreement in method comparison studies.” Stat Methods Med Res 8(2): 135–60.CrossRefGoogle ScholarPubMed
Bos, W. J., Goudoever, J., et al. (1996). “Reconstruction of brachial artery pressure from noninvasive finger pressure measurements.” Circulation 94(8): 1870–5.CrossRefGoogle ScholarPubMed
Feinstein, A. R., and Cicchetti, D. V. (1990). “High agreement but low kappa. I. The problems of two paradoxes.” J Clin Epidemiol 43(6): 543–9.CrossRefGoogle ScholarPubMed
Gill, M. R., Reiley, D. G, et al. (2004). “Interrater reliability of Glasgow Coma Scale scores in the emergency department.” Ann Emerg Med 43(2): 215–23.CrossRefGoogle ScholarPubMed
Krouwer, J. S. (2008). “Why Bland-Altman plots should use X, not (Y + X)/2 when X is a reference method.” Stat Med 27(5): 778–80.CrossRefGoogle Scholar
Lederle, F. A., Wilson, S. E., et al. (1995). “Variability in measurement of abdominal aortic aneurysms. Abdominal Aortic Aneurysm Detection and Management Veterans Administration Cooperative Study Group.” J Vasc Surg 21(6): 945–52.CrossRefGoogle ScholarPubMed
Meyer, B. C., Hemmen, T. M., et al. (2002). “Modified National Institutes of Health Stroke Scale for use in stroke clinical trials: prospective reliability and validity.” Stroke 33(5): 1261–6.CrossRefGoogle ScholarPubMed
Newman, T. B., Browner, W. S., et al. (2007). “Designing Studies of Medical Tests” in Designing Clinical Research. Hulley, S. B., et al. Philadelphia, PA, Lippincott Williams & Wilkins.Google Scholar
Sackett, D., Haynes, R, et al. (1991). Clinical Epidemiology: A Basic Science for Clinical Medicine. Boston, Little, Brown and Company.Google Scholar
Wren, T. A., Liu, X., et al. (2005). “Bone densitometry in pediatric populations: discrepancies in the diagnosis of osteoporosis by DXA and CT.” J Pediatr 146(6): 776–9.CrossRefGoogle Scholar
Yen, K., Karpas, A., et al. (2005). “Interexaminer reliability in physical examination of pediatric patients with abdominal pain.” Arch Pediatr Adolesc Med 159(4): 373–6.CrossRefGoogle ScholarPubMed
Gill, M. R., Reiley, D. G, et al. (2004). “Interrater reliability of Glasgow Coma Scale scores in the emergency department.” Ann Emerg Med 43(2): 215–23.CrossRefGoogle ScholarPubMed
Sinal, S. H., Lawless, M. R, et al. (1997). “Clinician agreement on physical findings in child sexual abuse cases.” Arch Pediatr Adolesc Med 151(5): 497–501.Google ScholarPubMed
Sprouse, L. R., 2nd, Meier, G. H, 3rd, et al. (2003). “Comparison of abdominal aortic aneurysm diameter measurements obtained with ultrasound and computed tomography: Is there a difference?J Vasc Surg 38(3): 466–71; discussion 471–2.CrossRefGoogle ScholarPubMed
Yen, K., Karpas, A., et al. (2005). “Interexaminer reliability in physical examination of pediatric patients with abdominal pain.” Arch Pediatr Adolesc Med 159(4): 373–6.CrossRefGoogle ScholarPubMed
Altman, D. G. (1991). Practical Statistics for Medical Research. London, Chapman and Hall.Google Scholar
Bland, J. M., and Altman, D. G. (1986). “Statistical methods for assessing agreement between two methods of clinical measurement.” Lancet 1(8476): 307–10.CrossRefGoogle ScholarPubMed
Bland, J. M., and Altman, D. G. (1996a). “Measurement error.” Br Med J 313(7059): 744.CrossRefGoogle ScholarPubMed
Bland, J. M., and Altman, D. G. (1996b). “Measurement error and correlation coefficients.” Br Med J 313(7048): 41–2.CrossRefGoogle ScholarPubMed
Bland, J. M., and Altman, D. G. (1999). “Measuring agreement in method comparison studies.” Stat Methods Med Res 8(2): 135–60.CrossRefGoogle ScholarPubMed
Bos, W. J., Goudoever, J., et al. (1996). “Reconstruction of brachial artery pressure from noninvasive finger pressure measurements.” Circulation 94(8): 1870–5.CrossRefGoogle ScholarPubMed
Feinstein, A. R., and Cicchetti, D. V. (1990). “High agreement but low kappa. I. The problems of two paradoxes.” J Clin Epidemiol 43(6): 543–9.CrossRefGoogle ScholarPubMed
Gill, M. R., Reiley, D. G, et al. (2004). “Interrater reliability of Glasgow Coma Scale scores in the emergency department.” Ann Emerg Med 43(2): 215–23.CrossRefGoogle ScholarPubMed
Krouwer, J. S. (2008). “Why Bland-Altman plots should use X, not (Y + X)/2 when X is a reference method.” Stat Med 27(5): 778–80.CrossRefGoogle Scholar
Lederle, F. A., Wilson, S. E., et al. (1995). “Variability in measurement of abdominal aortic aneurysms. Abdominal Aortic Aneurysm Detection and Management Veterans Administration Cooperative Study Group.” J Vasc Surg 21(6): 945–52.CrossRefGoogle ScholarPubMed
Meyer, B. C., Hemmen, T. M., et al. (2002). “Modified National Institutes of Health Stroke Scale for use in stroke clinical trials: prospective reliability and validity.” Stroke 33(5): 1261–6.CrossRefGoogle ScholarPubMed
Newman, T. B., Browner, W. S., et al. (2007). “Designing Studies of Medical Tests” in Designing Clinical Research. Hulley, S. B., et al. Philadelphia, PA, Lippincott Williams & Wilkins.Google Scholar
Sackett, D., Haynes, R, et al. (1991). Clinical Epidemiology: A Basic Science for Clinical Medicine. Boston, Little, Brown and Company.Google Scholar
Wren, T. A., Liu, X., et al. (2005). “Bone densitometry in pediatric populations: discrepancies in the diagnosis of osteoporosis by DXA and CT.” J Pediatr 146(6): 776–9.CrossRefGoogle Scholar
Yen, K., Karpas, A., et al. (2005). “Interexaminer reliability in physical examination of pediatric patients with abdominal pain.” Arch Pediatr Adolesc Med 159(4): 373–6.CrossRefGoogle ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×