Skip to main content Accessibility help
×
Hostname: page-component-cc8bf7c57-llmch Total loading time: 0 Render date: 2024-12-10T14:25:25.146Z Has data issue: false hasContentIssue false

Machine Learning for Experiments in the Social Sciences

Published online by Cambridge University Press:  21 March 2023

Jon Green
Affiliation:
Northeastern University
Mark H. White, II
Affiliation:
Etsy, Inc.

Summary

Causal inference and machine learning are typically introduced in the social sciences separately as theoretically distinct methodological traditions. However, applications of machine learning in causal inference are increasingly prevalent. This Element provides theoretical and practical introductions to machine learning for social scientists interested in applying such methods to experimental data. We show how machine learning can be useful for conducting robust causal inference and provide a theoretical foundation researchers can use to understand and apply new methods in this rapidly developing field. We then demonstrate two specific methods – the prediction rule ensemble and the causal random forest – for characterizing treatment effect heterogeneity in survey experiments and testing the extent to which such heterogeneity is robust to out-of-sample prediction. We conclude by discussing limitations and tradeoffs of such methods, while directing readers to additional related methods available on the Comprehensive R Archive Network (CRAN).
Get access
Type
Element
Information
Online ISBN: 9781009168236
Publisher: Cambridge University Press
Print publication: 13 April 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abramson, Scott F., Kocak, Korhan, Magazinnik, Asya, and Strezhnev, Anton. 2020. “Improving Preference Elicitation in Conjoint Designs Using Machine Learning for Heterogeneous Effects.” Working paper. www.korhankocak.com/publication/akms/.Google Scholar
Athey, Susan, and Imbens, Guido. 2016. “Recursive Partitioning for Heterogeneous Causal Effects.” Proceedings of the National Academy of Sciences 113 (27): 73537360.Google Scholar
Athey, Susan, Tibshirani, Julie, and Wager, Stefan. 2019. “Generalized Random Forests.” Annals of Statistics 47 (2): 11481178.CrossRefGoogle Scholar
Ballarini, Nicolas M., Thomas, Marius, Rosenkranz, Gerd K., and Bornkamp, Björn. 2021. “Subtee: An R Package for Subgroup Treatment Effect Estimation in Clinical Trials.” Journal of Statistical Software 99 (14): 117.CrossRefGoogle Scholar
Bates, Stephen, Hastie, Trevor, and Tibshirani, Robert. 2021. “Cross-Validation: What Does It Estimate and How Well Does It Do It?” Working paper. https://arxiv.org/abs/2104.00673.Google Scholar
Beebee, Helen, Hitchcock, Christopher, and Menzies, Peter. 2009. The Oxford Handbook of Causation. Oxford: Oxford University Press.Google Scholar
Beiser-McGrath, Janina, and Liam, Beiser-McGrath. 2020. “Problems with Products? Control Strategies for Models with Interaction and Quadratic Effects.” Political Science Research and Methods 8 (4): 707730.Google Scholar
Blackwell, Matthew, and Olson, Michael. 2022a. Inters: Flexible Tools for Estimating Interactions. https://CRAN.R-project.org/package=inters.Google Scholar
Blackwell, Matthew, and Olson, Michael 2022b. “Reducing Model Misspecification and Bias in the Estimation of Interactions.” Political Analysis 30 (4): 495514.CrossRefGoogle Scholar
Blair, Elizabeth. 2020. “‘Ugly,’ ‘Discordant’: New Executive Order Takes Aim at Modern Architecture.” NPR, December 21. www.npr.org/2020/02/13/805256707/just-plain-ugly-proposed-executive-order-takes-aim-at-modern-architecture.Google Scholar
Bon, Joshua J. 2022. Tidytreatment: Tidy Methods for Bayesian Treatment Effect Models. https://CRAN.R-project.org/package=tidytreatment.Google Scholar
Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24: 123140.CrossRefGoogle Scholar
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45: 532.CrossRefGoogle Scholar
Bryan, Christopher J., Tipton, Elizabeth, and Yeager, David S.. 2021. “Behavioural Science Is Unlikely to Change the World without a Heterogeneity Revolution.” Nature Human Behavior 5: 980989.CrossRefGoogle ScholarPubMed
Burkov, Andriy. 2019. The Hundred-Page Machine Learning Book. Andriy Burkov.Google Scholar
Campbell, Donald T. 1973. “The Social Scientist As Methodological Servant of the Experimenting Society.” Policy Studies and the Social Sciences 2 (1): 2732.Google Scholar
Chen, Shuai, Tian, Lu, Cai, Tianxi, and Yu, Menggang. 2017. “A General Statistical Framework for Subgroup Identification and Comparative Treatment Scoring.” Biometrics 73 (4): 11991209. https://doi.org/10.1111/biom.12676.Google Scholar
Chen, Tianqi, and Guestrin, Carlos. 2016. “XGBoost: A Scalable Tree Boosting System.” In KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785794. New York: Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785.Google Scholar
Chen, Tianqi, Tong, He, Benesty, Michael et al. 2022. Xgboost: Extreme Gradient Boosting. https://CRAN.R-project.org/package=xgboost.Google Scholar
Chernozhukov, Victor, Demirer, Mert, Duflo, Esther, and Fernandez-Val, Ivan. 2018. “Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India.” National Bureau of Economic Research. Working Paper No. 24678.Google Scholar
Collaboration, Open Science. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.Google Scholar
Crandall, Christian S., Silvia, Paul J., N’Gbala, Ahogni Nicolas, Tsang, Jo-Ann, and Dawson, Karen. 2007. “Balance Theory, Unit Relations, and Attribution: The Underlying Integrity of Heiderian Theory.” Review of General Psychology 11 (1): 1230.CrossRefGoogle Scholar
Cranmer, Skyler, and Desmarais, Bruce. 2017. “What Can We Learn from Predictive Modeling?Political Analysis 25 (2): 145166.CrossRefGoogle Scholar
Cronbach, Lee J. 1975. “Beyond the Two Disciplines of Scientific Psychology.” American Psychologist 30 (2): 116127.CrossRefGoogle Scholar
Dusseldorp, Elise, Doove, Lisa, and van Mechelen, Iven. 2016. “Quint: An R Package for the Identification of Subgroups of Clients Who Differ in Which Treatment Alternative Is Best for Them.” Behavior Research Methods 48 (2): 650663.Google Scholar
Dusseldorp, Elise, and Van Mechelen, Iven. 2014. “Qualitative Interaction Trees: A Tool to Identify Qualitative Treatment–Subgroup Interactions.” Statistics in Medicine 33 (2): 219237.CrossRefGoogle Scholar
Ebersole, Charles R., Atherton, Olivia E., Belanger, Aimee L. et al. 2016. “Many Labs 3: Evaluating Participant Pool Quality across the Academic Semester via Replication.” Journal of Experimental Social Psychology 67: 6882.Google Scholar
Ebersole, Charles R., Mathur, Maya B., Baranski, Erica et al. 2020. “Many Labs 5: Testing Pre-Data-Collection Peer Review As an Intervention to Increase Replicability.” Advances in Methods and Practices in Psychological Science 3 (3): 309331.CrossRefGoogle Scholar
Fariss, Christopher, and Jones, Zachary. 2018. “Enhancing Validity in Observational Settings When Replication Is Not Possible.” Political Science Research and Methods 6 (2): 365380.Google Scholar
Fokkema, Marjolein. 2020. “Fitting Prediction Rule Ensembles with R Package pre.” Journal of Statistical Software 92 (12): 130.Google Scholar
Fokkema, Marjolein, and Strobl, Carolin. 2020. “Fitting Prediction Rule Ensembles to Psychological Research Data: An Introduction and Tutorial.” Psychological Methods 25 (5): 636652.Google Scholar
Foster, Jared C., Taylor, Jeremy M. G., and Ruberg, Stephen J.. 2011. “Subgroup Identification from Randomized Clinical Trial Data.” Statistics in Medicine 30 (24): 28672880.Google Scholar
Freund, Yoav, and Schapire, Robert E.. 1996. “Experiments with a New Boosting Algorithm.” In Saitta, Lorenza, ed., ICML ’96: Proceedings of the Thirteenth International Conference on Machine Learning, 148156. San Francisco, CA: Morgan Kaufmann.Google Scholar
Friedman, Jerome. 2002. “Stochastic Gradient Boosting.” Computational Statistics and Data Analysis 38 (4): 367378.Google Scholar
Gelman, Andrew. 2015. “The Connection between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective.” Journal of Management 41 (2): 632643.Google Scholar
Gelman, Andrew, and Loken, Eric. 2013. “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘Fishing Expedition’ or ‘P-Hacking’ and the Research Hypothesis Was Posited Ahead of Time.” [Online]. www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf.Google Scholar
Gentzkow, Matthew, Jesse, Shapiro, and Taddy, Matthew. 2019. “Measuring Group Differences in High Dimensional Choices: Method and Application to Congressional Speech.” Econometrica 87 (4): 13071340.Google Scholar
Géron, Aurélien. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, CA: O’Reilly Media.Google Scholar
Glass, Gene V. 1976. “Primary, Secondary, and Meta-Analysis of Research.” Educational Researcher 5 (10): 38.CrossRefGoogle Scholar
Green, Donald, and Kern, Holger. 2012. “Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees.” Public Opinion Quarterly 76 (3): 491511.CrossRefGoogle Scholar
Green, Donald P., and Gerber, Alan S.. 2004. Get Out the Vote! How to Increase Voter Turnout. Washington, DC: Brookings Institution Press.Google Scholar
Green, Jon, Schaffner, Brian, and Luks, Sam. 2023. “Strategic Discrimination in the 2020 Democratic Primary.” Public Opinion Quarterly nfac051. https://doi.org/10.1093/poq/nfac051.Google Scholar
Grimmer, Justin, Messing, Solomon, and Westwood, Sean J.. 2017. “Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods.” Political Analysis 25 (4): 413434.CrossRefGoogle Scholar
Ham, Dae Woong, Imai, Kosuke, and Janson, Lucas. 2022. “Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis.” arXiv. https://arxiv.org/abs/2201.08343.Google Scholar
Hare, Christopher, and Kutsuris, Mikayla. 2022. “Measuring Swing Voters with a Supervised Machine Learning Ensemble.” Political Analysis, 117. www.cambridge.org/core/journals/political-analysis/article/measuring-swing-voters-with-a-supervised-machine-learning-ensemble/145B1D6B0B2877FC454FBF446F9F1032.Google Scholar
Hastie, Trevor, Tibshirani, Robert, and Friedman, Jerome. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer Science & Business Media.Google Scholar
Head, Megan L., Holman, Luke, Lanfear, Rob, Kahn, Andrew T, and Jennions, Michael D. 2015. “The Extent and Consequences of P-Hacking in Science.” PLoS Biology 13 (3): e1002106.Google Scholar
Heider, Fritz. 1958. The Psychology of Interpersonal Relations. New York: Wiley.Google Scholar
Hernàn, Miguel A., and VanderWeele, Tyler J.. 2011. “Compound Treatments and Transportability of Causal Inference.” Epidemiology 22 (3): 368377.Google Scholar
Hoffman, Jake M., Sharma, Amit, and Watts, Duncan J.. 2021. “Prediction and Explanation in Social Systems.” Science 355 (6324): 486488. https://science.sciencemag.org/content/355/6324/486.Google Scholar
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945960.Google Scholar
Huling, Jared D., and Yu, Menggang. 2021. “Subgroup Identification Using the personalized Package.” Journal of Statistical Software 98 (5): 160. https://doi.org/10.18637/jss.v098.i05.Google Scholar
Imai, Kosuke, and Ratkovic, Marc. 2013. “Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.” Annals of Applied Statistics 7 (1): 443470.CrossRefGoogle Scholar
Imai, Kosuke, and Strauss, Aaron. 2011. “Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-Out-the-Vote Campaign.” Political Analysis 19 (1): 119.Google Scholar
James, Gareth, Witten, Daniela, Hastie, Trevor, and Tibshirani, Robert. 2013. An Introduction to Statistical Learning. New York: Springer.CrossRefGoogle Scholar
Keele, Luke. 2015. “The Statistics of Causal Inference: A View from Political Methodology.” Political Analysis 23 (3): 313335.Google Scholar
Kerr, Norbert L. 1998. “HARKing: Hypothesizing After the Results Are Known.” Personality and Social Psychology Review 2 (3): 196217.Google Scholar
Klein, Richard A., Cook, Corey L., Ebersole, Charles R. et al. 2019. “Many Labs 4: Failure to Replicate Mortality Salience Effect with and without Original Author Involvement.” PsyArXiv. https://doi.org/10.31234/osf.io/vef2c.Google Scholar
Klein, Richard A., Vianello, Michelangelo, Hasselman, Fred et al. 2018. “Many Labs 2: Investigating Variation in Replicability across Samples and Settings.” Advances in Methods and Practices in Psychological Science 1 (4): 443490.Google Scholar
Kuhn, Max, and Johnson, Kjell. 2013. Applied Predictive Modeling. Vol. 26. New York: Springer.Google Scholar
Kuhn, Max, and Silge, Julia. 2022. Tidy Modeling with R: A Framework for Modeling in the Tidyverse. Sebastopol, CA: O’Reilly Media.Google Scholar
Künzel, Sören R., Sekhon, Jasjeet S., Bickel, Peter J., and Bin, Yu. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 41564165.Google Scholar
Lipkovich, Ilya, Dmitrienko, Alex, Denne, Jonathan, and Enas, Gregory. 2011. “Subgroup Identification Based on Differential Effect Search: A Recursive Partitioning Method for Establishing Response to Treatment in Patient Subpopulations.” Statistics in Medicine 30 (21): 26012621.Google Scholar
McClelland, Gary H., and Judd, Charles M.. 1993. “Statistical Difficulties of Detecting Interactions and Moderator Effects.” Psychological Bulletin 114 (2): 376.CrossRefGoogle ScholarPubMed
Montgomery, Jacob M., and Olivella, Santiago. 2018. “Tree-Based Models for Political Science Data.” American Journal of Political Science 62 (3): 729744.Google Scholar
Nicholson, Stephen. 2012. “Polarizing Cues.” American Journal of Political Science 56 (1): 5266.Google Scholar
Nicosia, Jessica, Cohen-Shikora, Emily R., and Balota, David A.. 2021. “Re-examining Age Differences in the Stroop Effect: The Importance of the Trees in the Forest (Plot).” Psychology and Aging 36 (2): 214231.Google Scholar
Nie, Xinkun, and Wager, Stefan. 2021. “Quasi-Oracle Estimation of Heterogeneous Treatment Effects.” Biometrika 108 (2): 299319.CrossRefGoogle Scholar
Nosek, Brian A., Ebersole, Charles R., Alexander, C. DeHaven, and Mellor, David T.. 2018. “The Preregistration Revolution.” Proceedings of the National Academy of Sciences 115 (11): 26002606.Google Scholar
Peterson, Andrew, and Spirling, Arthur. 2018. “Classification Accuracy As a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26 (1): 120128.Google Scholar
Polley, Eric, LeDell, Erin, Kennedy, Chris, and van der Laan, Mark. 2021. SuperLearner: Super Learner Prediction. https://CRAN.R-project.org/package=SuperLearner.Google Scholar
Ratkovic, Marc. 2021. “Subgroup Analysis: Pitfalls, Promise, and Honesty.” In Druckman, James N. and Green, Donald P. (Eds.), Advances in Experimental Political Science, 271288. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108777919.020.Google Scholar
Ratkovic, Marc, and Tingley, Dustin. 2017. “Sparse Estimation and Uncertainty with Application to Subgroup Analysis.” Political Analysis 25 (1): 140.Google Scholar
Ripley, Brian. 2021. Tree: Classification and Regression Trees. https://CRAN.R-project.org/package=tree.Google Scholar
Riviere, Marie-Karelle. 2021. SIDES: Subgroup Identification Based on Differential Effect Search. https://CRAN.R-project.org/package=SIDES.Google Scholar
Rosenthal, Robert. 1979. “The File Drawer Problem and Tolerance for Null Results.” Psychological Bulletin 86 (3): 638.Google Scholar
Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66 (5): 688701.Google Scholar
Rubin, Donald B. 2008. “For Objective Causal Inference, Design Trumps Analysis.” Annals of Applied Statistics 2 (3): 808840.Google Scholar
Rubin, Mark, and Donkin, Chris. 2022. “Exploratory Hypothesis Tests Can Be More Compelling Than Confirmatory Hypothesis Tests.” Philosophical Psychology. https://doi.org/10.1080/09515089.2022.2113771.Google Scholar
Seibold, Heidi, Zeileis, Achim, and Hothorn, Torsten. 2019. “Model4you: An R Package for Personalised Treatment Effect Estimation.” Journal of Open Research Software 7 (1). http://doi.org/10.5334/jors.219.Google Scholar
Shmueli, Galit. 2010. “To Explain or to Predict?Statistical Science 25 (3): 289310.Google Scholar
Shrout, Patrick E., and Rodgers, Joseph L.. 2018. “Psychology, Science, and Knowledge Construction: Broadening Perspectives from the Replication Crisis.” Annual Review of Psychology 69 (1): 487510. https://doi.org/10.1146/annurev-psych-122216-011845.Google Scholar
Silberzahn, Raphael, Uhlmann, Eric L., Martin, Daniel P. et al. 2018. “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results.” Advances in Methods and Practices in Psychological Science 1 (3): 337356.Google Scholar
Simmons, Joseph P., Nelson, Leif D., and Simonsohn, Uri. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything As Significant.” Psychological Science 22 (11): 13591366.Google Scholar
Simonsohn, Uri, Nelson, Leif D., and Simmons, Joseph P.. 2014. “P-Curve: A Key to the File-Drawer.” Journal of Experimental Psychology: General 143 (2): 534.Google Scholar
Soderberg, Courtney K., Errington, Timothy M., Schiavone et al, Sarah R.. 2021. “Initial Evidence of Research Quality of Registered Reports Compared with the Standard Publishing Model.” Nature Human Behaviour 5: 990997. https://doi.org/10.1038/s41562-021-01142-4.CrossRefGoogle ScholarPubMed
Sparapani, Rodney, Spanbauer, Charles, and Robert, McCulloch. 2021. “Nonparametric Machine Learning and Efficient Computation with Bayesian Additive Regression Trees: The BART R Package.” Journal of Statistical Software 97 (1): 166. https://doi.org/10.18637/jss.v097.i01.Google Scholar
Stieger, James H. 1990. “Structural Model Evaluation and Modification: An Interval Estimation Approach.” Multivariate Behavioral Research 25 (2): 173180.Google Scholar
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, and Zeileis, Achim. 2008. “Conditional Variable Importance for Random Forests.” BMC Bioinformatics 9 (307). https://doi.org/10.1186/1471-2105-9-307.Google Scholar
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, and Hothorn, Torsten. 2007. “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics 8 (25). https://doi.org/10.1186/1471-2105-8-25.Google Scholar
Tibshirani, Julie, Athey, Susan, Sverdrup, Erik, and Wager, Stefan. 2021. Grf: Generalized Random Forests. https://CRAN.R-project.org/package=grf.Google Scholar
Vieille, Francois, and Foster, Jared. 2018. AVirtualTwins: Adaptation of Virtual Twins Method from Jared Foster. https://CRAN.R-project.org/package=aVirtualTwins.Google Scholar
Wager, Stefan, and Athey, Susan. 2018. “Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests.” Journal of the American Statistical Association 113 (523): 12281242.Google Scholar
Wang, Chenguang, Louis, Thomas A., Henderson, Nicholas C., Weiss, Carlos O., and Varadhan, Ravi. 2018. “Beanz: An R Package for Bayesian Analysis of Heterogeneous Treatment Effects with a Graphical User Interface.” Journal of Statistical Software 85 (7): 131.Google Scholar
Wright, Marvin N., and Ziegler, Andreas. 2017. “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77 (1): 117. https://doi.org/10.18637/jss.v077.i01.Google Scholar
Yadlowsky, Steve, Fleming, Scott, Shah, Nigam, Brunskill, Emma, and Wager, Stefan. 2021. “Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects.” arXiv. https://arxiv.org/abs/2111.07966.Google Scholar
Yarkoni, Tal, and Westfall, Jacob. 2017. “Choosing Prediction over Explanation in Psychology: Lessons from Machine Learning.” Perspectives on Psychological Science 12 (6): 11001122.Google Scholar

Save element to Kindle

To save this element to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Machine Learning for Experiments in the Social Sciences
Available formats
×

Save element to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Machine Learning for Experiments in the Social Sciences
Available formats
×

Save element to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Machine Learning for Experiments in the Social Sciences
Available formats
×