Cambridge Catalogue  
  • Help
Home > Catalogue > Matched Sampling for Causal Effects
Matched Sampling for Causal Effects

Details

  • 108 tables
  • Page extent: 502 pages
  • Size: 228 x 152 mm
  • Weight: 0.676 kg

Paperback

 (ISBN-13: 9780521674362 | ISBN-10: 0521674360)

My Introduction to Matched Sampling

This volume reprints my publications on matched sampling, or more succinctly, matching, produced during a period of over three decades. My work on matching began just after I graduated college in 1965 and has continued to the present, and beyond, in the sense that there are publications on matching subsequent to those collected here, and I have continuing work in progress on the topic. For most of the years during this period, I believe I was one of the few statistical researchers publishing in this area, and therefore this collection is, I hope, both interesting and historically relevant. In the introduction to each part, I attempt to set the stage for the particular articles in that part. When read together, the part introductions provide a useful overview of developments in matched sampling. In contrast to the earlier years, in the last few years, there have been many other researchers making important contributions to matching. Among these, ones by technically adroit economists and other social scientists are particularly notable, for example: Hahn (1998); Dehejia and Wahba (1999); Lechner (2002); Hansen (2004); Hill, Reiter, and Zanutto (2004); Hirano, Imbens, and Ridder (2004); Imbens (2004); Zhao (2004); Abadie and Imbens (2005); and Diamond and Sekon (2005). Some of these have had a direct or indirect connection to a course on causal inference I've taught at Harvard for over a decade, sometimes jointly with Guido Imbens.

   My interest in matched sampling started at Princeton University, but not until after I graduated. Nevertheless, it was heavily influenced by my time there. I started my college career in 1961 at Princeton University intending to major in physics, as part of a cohort of roughly 20 kids initially mentored by John Wheeler, who (if memory serves correctly) hoped that we would get PhDs in physics in five years from enrollment as freshman – we were all lined up earlier for an AB in physics in three years. In retrospect, this was a wildly overambitious agenda, at least for me. For a combination of reasons, including the Vietnam War and Professor Wheeler’s sabbatical at a critical time, I think that no one succeeded in completing the ambitious five-year PhD from entry. We ended up in a variety of departments – I ended up in psychology, but the others in math, chemistry, physics, economics, et cetera. But I still loved the ways of thought in physics and possessed a few of the technical skills, and had some computational skills, which was relatively unusual at the time. Those early computational skills were my entry into the world of matched sampling, and eventually into statistics.

   In the fall of 1965 I entered Harvard University in a PhD program in psychology. This lasted about two weeks, until that department’s PhD advisor decided that my undergraduate program was deficient, primarily because I lacked enough courses in statistics. I had to take some “baby” statistics courses – me, a previous physics type at Princeton! Hrumpf ! Feeling insulted, and unwilling to take such courses, I switched to an applied math program, principally computer science, which I could do fairly simply because I had the aforementioned skills as well as independent support through a National Science Foundation graduate fellowship. In the spring of 1966, I got my Master's in applied mathematics. That summer, I lived near Princeton with old Princeton University roommates, and we supported ourselves doing consulting, which was my introduction to matched sampling.

   At that time, Robert Althauser was a junior faculty member in the Department of Sociology at Princeton working on comparisons of white and black students at Temple University. My memory is a bit vague, but I think the objective was to compare white and black students with very similar backgrounds to see if the groups' academic achievements differed. Thus arose the desire to create matched samples of blacks and whites. I was hired primarily as the “computer guy,” programming and implementing algorithms that we created. This work resulted in a methodological publication, Althauser and Rubin (1970), but more important to my career, it stimulated an interest in a topic that has had exciting applications to this day, and fostered interesting statistical theory as well.

   After returning to Harvard for my second and third years in applied math, I realized that there was a department called “Statistics” that seemed to include the study of things that I was already doing. By my fourth year at Harvard, I was a PhD student in that department, with a ready topic for my thesis, and with a fabulous PhD advisor who was also a wonderful human being, William G. Cochran. Bill, who was then one of three senior faculty in the Department of Statistics at Harvard University, the others being Arthur Dempster and Frederick Mosteller, had a powerful influence on me. He taught me what good statistics meant: doing something to address an important real problem. If a project didn’t have some relevance to the real world, Bill’s view was that it might be of interest to some, and that was OK, but it wasn’t of interest to him. It might be great mathematics, but then I should convince a mathematician of that, not him – he was neither interested nor able to make such an assessment. Over the years, I've tried to instill the same attitude in my own PhD students in statistics.

   I submitted my thesis on matched sampling (Rubin, 1970), written under Bill’s direction, in the spring of 1970, and stayed on for one year in the Statistics Department at Harvard, post-PhD, co-teaching the statistics course in psychology (with Bob Rosenthal) that had driven me out of psychology five years earlier. Realizing that life as a junior faculty member was, at that stage, not for me, I moved to the Educational Testing Service (ETS) in the fall of 1971, where I also taught as a visiting faculty member in the new Statistics Department at Princeton. During that time, I continued to spend time at Harvard with Bill, Bob Rosenthal, and Art Dempster (another important influence on my view of statistics), and, of particular relevance here, I continued to refine the ideas and the work in my PhD thesis.

   But enough of ancient personal history. Matching, or matched sampling, refers to the following situation. We have two groups of units, for example, people. One group is exposed to some “treatment” (e.g., cigarette smoking) and another is not exposed, the “controls” (e.g., never smokers). Our focus is on the causal effect of the exposure (e.g., smoking versus not smoking) on outcomes such as lung cancer or coronary disease. Of course, smokers and nonsmokers may differ on background characteristics such as education, and at a first pass, it would make little sense to compare disease rates in well educated nonsmokers and poorly educated smokers. Whence the idea of matched sampling: create samples of smokers and nonsmokers matched on education and other background variables. Or in the Althauser example, even though it may not make sense to talk about the “causal” effect of a person being a white student versus being a black student, it can be interesting to compare whites and blacks with similar background characteristics to see if there are differences in academic achievement, and creating matched black–white pairs is an intuitive way to implement this comparison.

   This technique of comparing “like with like” is intellectually on-target, and it can be formalized in a variety of ways in the context of drawing causal inferences. The topic of matched sampling is not intrinsically about this formalization, but rather concerns the extent to which such matched samples can be constructed and how to construct them, and precisely how well matching works to make the distributions of the background characteristics in the treatment and control groups the same. The formal structure for causal inference that clarifies the role of matching is now commonly called “Rubin’s Causal Model” (Holland, 1986b) for a series of articles written in the 1970s (Rubin, 1974, 1975, 1976a, 1977a, 1978a, 1979a, 1980a). This perspective is presented, for example, in Rubin (2005, 2006) and Imbens and Rubin (2006a), and the first text fully expositing it and many further developments, including the topic of matched sampling, is Imbens and Rubin (2006b).

   One of the critically important characteristics of matching is that it is done without access to any outcome data – only covariate data and treatment information are used. Therefore, when the matched samples are created this way, the investigator cannot realistically be accused of creating matches to obtain some result with respect to the outcomes, even unconsciously. This is of real benefit for honest study design, a point emphasized in a variety of places in this volume, including the concluding “Advice to the Investigator.”

   The reprinted articles are presented in seven parts, each part consisting of between two and five chapters, each of which is a reprinted article. The articles have been reset, with figures, tables, and equations renumbered to reflect the appropriate chapters. The articles reprinted in each part are organized to create a coherent topic, and the parts are organized to reflect a systematic historical development, with the last two parts consisting of examples. A final section provides concluding advice to the investigator.

   Printers' and related errors in the original articles have been corrected, when recognized, and some other stylistic modifications made for consistency. There was no attempt to make notation entirely consistent across chapters.





PART I: THE EARLY YEARS AND THE INFLUENCE OF WILLIAM G. COCHRAN

Even though this book is limited to reprinting publications of mine on matched sampling, it seems useful to provide some background history to the topic. As described in the initial introduction, I had already started working on matching before I met Bill Cochran. But Bill had been working on the design and analysis of observational studies for many years before my appearance.

   I reviewed Cochran’s work on observational studies, including his early papers, in Rubin (1984c), the first chapter in this book. This was originally a chapter written for a volume honoring Cochran’s impact on statistics, edited by Rao and Sedransk (1984). My review starts with Cochran (1953a), which focused on stratified and pair matching on a single covariate, X, and their effects on efficiency of estimation. That chapter continued with Cochran (1965), which was a compendium of advice on the design and analysis of observational studies. Also reviewed was Cochran (1968a), on subclassification and stratified matching, an article that I regard as extremely important. It was a departure from his earlier work on matching, as well as other early work on the effects of matching, all of which focused on the efficiency of estimation (e.g., Wilks (1932), which assumed exact matching with a normally distributed variable; Greenberg (1953), which compared mean matching and regression adjustment; Billewicz (1965), which I’ve always found relatively confusing). Cochran (1968a), however, considered the effect of the matching on the bias of estimates resulting from the matched, in contrast to random, samples, as well as matching’s effects on the precision of estimates (i.e., their efficiency or sampling variance).

   In some sense it is obvious that the issue of bias reduction dominates that of sampling variance reduction when faced with an observational study rather than a randomized experiment, but this was the first publication I know that actually studied bias reduction mathematically, despite “everyone” talking and writing about it when giving practical advice: A precise estimate of the wrong quantity can easily lead one astray. Cochran (1968a) was a wonderful article that in many ways set the tone for what I tried to do in all of my academic work. Not only was there clever mathematics, but there was also highly useful practical advice, such as the oft-quoted comment that matching using five or six well-chosen subclasses of a continuous covariate typically removes about 90% of the initial biasing effects of that covariate.

   Because of the late stage of Cochran’s research career when I began working with him, we wrote only one paper together, Cochran and Rubin (1973), included as the second chapter of Part I. He had been invited to write an article for a special issue of Sankhyā in honor of the famous Indian statistician, Mahalanobis, and he proposed that we join forces and write up some of his old notes for a book on observational studies that he had shelved in favor of Cochran (1965), plus some of the results in my thesis (subsequently published as Rubin, 1973a,b, both in Part II), along with some of the newer multivariate matching results that I had been developing (subsequently published as Rubin, 1976b,c, both in Part III). This work on discriminant matching, to some extent, anticipated subsequent work on propensity score methods, introduced in Chapter 10. Bill was not only a wonderful writer but a wonderful critic of others’ writing, even teaching me about adding “Cochran’s commas” to add clarity: “You wouldn’t mind if I added a ‘Cochran comma’ here, would you?” I of course agreed to the commas and the article, and we wrote this paper together in 1970–71 when I was at Harvard for the postgraduate year noted in the initial introduction.

   In the article, we addressed the dangers of relying solely on implicit extrapolation through linear regression adjustment on X to control for bias, and documented the improvements that can be achieved when using both matching and regression adjustment on X, noting that the matching is especially important when outcomes are nonlinearly related to X in subtle ways. We also reviewed some of the older Cochran wisdom about the design and analysis of observational studies, the consequences of errors of measurement from Bill’s notes, and discussed multivariate matching methods (e.g., discriminant matching, Mahalanobis-metric matching). Even though the topic of observational studies was not one close to Mahalanobis’s own work, the fact that his name could be attached to a procedure, “Mahalanobis-metric matching,” made the article seem appropriate then for that issue, and with hindsight, even more so because the method has become quite popular (e.g., even currently in economics, Imbens (2004)). The Mahalanobis metric uses the inverse variance–covariance matrix of the matching variables to define the distance between a pair of possible matches, an important inner-product metric.

   At the time, there was very little technical work being done on matching. In fact, the only publication I know, not otherwise referenced in this volume, is Carpenter (1977). This Biometrics article had technical results on efficiency, where the implied matching method was Mahalanobis-metric matching.





1. William G. Cochran’s Contributions to the Design, Analysis, and Evaluation of Observational Studies

Donald B. Rubin

1. INTRODUCTION

William G. Cochran worked on statistically rich and scientifically important problems. Throughout his career he participated in the design, analysis, and evaluation of statistical studies directed at important real world problems. The field of observational studies is a perfect example of a general topic that Cochran helped to define and advance with many contributions. Cochran’s work provides an essential foundation for continuing research in this important area of statistics.

   An observational study, for purposes here, is a study intended to assess causal effects of treatments where the rule that governs the assignment of treatments to units is at least partially unknown. Thus a randomized experiment on rats for the effect of smoke inhalation on lung cancer is a controlled experiment rather than an observational study, but an analysis of health records for samples of smokers and nonsmokers from the U.S. population is an observational study. The obvious problem created by observational studies is that there may exist systematic differences between the treatment groups besides treatment exposure, and so any observed differences between the groups (e.g., between smokers and nonsmokers) with respect to an outcome variable (e.g., incidence of lung cancer) might be due to confounding variables (e.g., age, genetic susceptibility to cancer) rather than the treatments themselves. Consequently, a primary objective in the design and analysis of observational studies is to control, through sampling and statistical adjustment, the possible biasing effects of those confounding variables that can be measured: a primary objective in the evaluation of observational studies is to speculate about the remaining biasing effects of those confounding variables that cannot be measured.

   Although observational studies always suffer from the possibility of unknown sources of bias, they are probably the dominant source of information on causal effects of the treatments and certainly are an important supplement to information arising from experiments. Among the reasons for the importance of observational studies are the following. First, quite often the equivalent randomized experiment cannot be conducted for ethical or political reasons, as when studying the effects of in utero exposure to radiation in humans. Second, if the equivalent randomized experiment is possible to conduct in practice, it usually will be far more expensive than the observational study, and its results may not be available for many years, whereas the results from the observational study may be at hand: for example, consider studying the relationship between cholesterol levels in diet and subsequent heart disease. Third, the units in observational studies are usually more representative of the target populations because randomized experiments generally have to be conducted in restricted environments, such as particular hospitals with consenting patients. Fourth, ascertaining which treatments should be studied in future randomized experiments should be based on the analysis of the best available data sets, which are, in early stages of investigation, nearly always observational. Fifth, many studies designed as experiments become more like observational studies when protocols are broken, patients leave the study, measurements are missing, and so on. Sixth, even within properly conducted randomized experiments, many important questions may be observational; for example, in an experiment randomizing patients to either medical or surgical treatment of coronary artery disease, the study of which version of the surgical treatment (e.g., which hospital or the number of bypasses) is most effective is an observational study.

   The ultimate objective of Cochran’s statistical research on observational studies was to provide the investigator with reliable statistical tools and sage advice on their use. On more than one occasion Bill told me that it was better to give the applied researcher a reliable tool that is well understood and will be used properly than a more powerful, but complex and potentially misunderstood tool, that can be easily misapplied. As expected given such an orientation, Cochran seems to have had no a priori favorite methods, good practice being more important than philosophical orientation.

   In this chapter I will try to convey the broad themes of his advice and trace the historical development of ideas in his articles on observational studies. Although some theory will be presented, it is not necessary to present a comprehensive review of technical results from his articles on observational studies because many of these have been summarized in our joint paper “Controlling Bias in Observational Studies: A Review,” published in Sankhyā-A in 1973, here reprinted as Chapter 2. Subsequent reviews of statistical research on observational studies include McKinlay (1975a) and Anderson et al. (1980). This review will focus on Cochran’s contributions, especially those aspects that I find particularly influential and important. Because he wrote exceptionally well and with deep understanding of the underlying issues, it is appropriate to make liberal use of quotations from his works.

   Section 2 summarizes themes of Cochran’s advice on observational studies. These themes appear repeatedly in his articles and deserve this emphasis because of their importance. Memories of his written and verbal advice have been very useful to me when thinking about observational studies such as those on the effectiveness of coaching for the SAT (Messick, 1980), the effect of prenatal exposure to hormones (Reinisch and Karow, 1977), the effectiveness of coronary bypass operations (Murphy et al., 1977), and the effectiveness of private versus public schools (Coleman et al., 1981).

   Sections 3 through 9 summarize his seven major articles on the design, analysis, and evaluation of observational studies. These were published in 1953, 1957, 1965, 1968, 1970, 1972, and 1973 and are major in the sense of being accessible and influential. Section 10 very briefly comments on other work: four proceedings papers with limited distribution, an unpublished technical report, two papers dealing with historical aspects of experimentation, and his 1968 Harvard seminar on observational studies. Section 11 discusses his monograph on observational studies, published posthumously at the end of 1983, but, I believe, written in large part prior to 1972. Finally, Section 12 adds a few more personal comments in the context of Cochran’s attitude towards the interplay of statistical theory and practice.

2. MAJOR THEMES OF ADVICE FOR OBSERVATIONAL STUDIES

Cochran’s opening to “Matching in Analytical Studies,” published in 1953, defines very clearly the relevant issues in observational studies:

 

Most of the following discussion will be confined to studies in which we compare two populations, which will be called the experimental population and the control population. The experimental population possesses some characteristic (called the experimental factor) the effects of which we wish to investigate: It may consist, for example, of premature infants, of physically handicapped men, of families living in public housing, or of inhabitants of an urban area subject to smoke pollution, the experimental factors being, respectively, prematurity, physical handicaps, public housing, and smoke pollution. I shall suppose that we cannot create the experimental population, but must take it as we find it, except that there may be a choice among several populations that are available for study.

 
 

The purpose of the control population is to serve as a standard of comparison by which the effects of the experimental factor are judged. The control population must lack this factor, and ideally it should be similar to the experimental population in anything else that might affect the criterion variables by which the effects of the factor are measured. Occasionally, an ideal control population can be found, but, more usually, even the most suitable control population will still differ from the experimental population in certain properties which are known or suspected to have some correlation with the criterion variables.

 
 

When the control and experimental populations have been determined, the only further resource at our disposal is the selection of the control and experimental samples which are to form the basis of the investigation. Sometimes this choice is restricted, because the available experimental population is so small that it is necessary to include all its members, only the control population being sampled.

 
 

The problem is to conduct the sampling and the statistical analysis of the results so that any consistent differences which appear between the experimental and the control samples can be ascribed with reasonable confidence to the effects of the factor under investigation.

 

   Notice that in this introduction examples are used to make sure the reader knows the kind of study to be discussed. Throughout his publications, Cochran continued to use examples to motivate discussion. In particular, he referred often to studies of smoking and health (e.g., U.S. Surgeon General’s Committee, 1964), preventing poliomyelitis (e.g., Hill and Knowelden, 1950), and exposure to radiation (e.g., Seltser and Sartwell, 1965).

   The major themes of Cochran’s advice on observational studies fit well into three broad categories: design, analysis, and drawing conclusions or evaluation. Although such simplifications are never entirely adequate, this classification serves as a useful guide when reading Cochran’s work.

   In design, Cochran emphasized the need to measure, as well as possible, important variables, both outcome, y, and disturbing, x. Outcome variables are those that are used to assess the effects of the treatments being studied, for example, the presence of lung cancer or polio in medical studies or the score on an achievement test in educational studies. Generally, the purpose of an observational study is to provide insight into how the treatments causally affect the outcome variables. Disturbing (confounding) variables are those that confound the relationship between treatment and y because they may have a different distribution in the experimental group than in the control group. Examples of commonly controlled disturbing variables in studies of human populations include age and gender, as well as measures of pretreatment health in medical studies and pretreatment achievement in educational studies. If the measurements cannot be well taken or the sample sizes would be inadequate, the study may not be worth attempting, and the researchers should consider this possibility.

   A second theme in design is the need for a control group, perhaps several control groups (e.g., for a within-hospital treatment group, both a within-hospital control group and a general-population control group). The rationale for having several control groups is straightforward: If similar estimates of effects are found relative to all control groups, then the effect of the treatment may be thought large enough to dominate the various biases probably existing in the control groups, and thus the effect may be reasonably well estimated from the data. My reading of Cochran’s work suggests that his insistence on using control groups rather than, say, relying on before–after studies, became stronger as he became older. Another theme in the design of observational studies is the desire to avoid control groups with large initial biases. No statistical procedure can be counted on to adjust for large initial differences between experimental and control populations. A final theme for the design of observational studies is to use matched sampling or blocking to reduce initial bias. Incidentally, Cochran viewed pair matching as an excellent applied tool because the investigator could easily see how similar (or not) the paired units were, and therefore did not have to understand or trust the statistician’s fancy adjustment techniques before making judgments about residual bias due to observed disturbing variables.

   In analysis, Cochran studied the properties of two major methods of adjustment: subclassification and covariance (regression) adjustment. With subclassification, the treatment and control samples are stratified (i.e., classified) on the primary disturbing variables, and within-stratum (i.e., within-subclass) estimates of treatment effect are made and then combined to form a global estimate of treatment effect. For example, suppose the primary disturbing variable that is measured in a study of the effect of smoking on lung cancer is age: with subclassification, within each age subclass (e.g., < 25, 25–35, > 35–50, and > 50), the difference between the proportions of smokers and nonsmokers with lung cancer would be the estimated effect specific to that subclass; then these subclass-specific estimates would be weighted by the proportion of the target population in each subclass to form a global estimate of the effect of smoking on lung cancer. With covariance adjustment, a model, usually linear, is fit to the regression of y on x, and is used to create an adjusted estimate of treatment effect of the form y1 - y2 - β(x1 - x2) where y1, x1 and y2,x2 are the y and x means in the experimental and control groups, and β is an estimate of the slope, β, of y on x. Cochran generally recommended using these techniques to control disturbing variables and provided the investigator with guidance on how best to use them. Formally, these techniques can be seen as serving two purposes: to increase the precision of comparisons and to remove initial bias due to x. As Cochran’s work on observational studies progressed, he moved from focusing on their effectiveness for increasing precision to their effectiveness for removing bias. There are two interrelated reasons for this shift. First, since it is generally not wise to obtain a very precise estimate of a drastically wrong quantity, the investigator should be more concerned about having an estimate with small bias than one with small variance. Second, since in many observational studies the sample sizes are sufficiently large that sampling variances of estimators will be small, the sensitivity of estimators to biases is the dominant source of uncertainty.


printer iconPrinter friendly versionemail iconEmail a colleague AddThis