Hostname: page-component-78c5997874-dh8gc Total loading time: 0 Render date: 2024-11-19T13:19:04.612Z Has data issue: false hasContentIssue false

The micro-task market for lemons: data quality on Amazon's Mechanical Turk

Published online by Cambridge University Press:  25 October 2021

Douglas J. Ahler*
Affiliation:
Florida State University, Tallahassee, FL, USA
Carolyn E. Roush
Affiliation:
Florida State University, Tallahassee, FL, USA
Gaurav Sood
Affiliation:
Independent Researcher
*
*Corresponding author. Email: dahler@fsu.edu

Abstract

While Amazon's Mechanical Turk (MTurk) has reduced the cost of collecting original data, in 2018, researchers noted the potential existence of a large number of bad actors on the platform. To evaluate data quality on MTurk, we fielded three surveys between 2018 and 2020. While we find no evidence of a “bot epidemic,” significant portions of the data—between 25 and 35 percent—are of dubious quality. While the number of IP addresses that completed the survey multiple times or circumvented location requirements fell almost 50 percent over time, suspicious IP addresses are more prevalent on MTurk than on other platforms. Furthermore, many respondents appear to respond humorously or insincerely, and this behavior increased over 200 percent from 2018 to 2020. Importantly, these low-quality responses attenuate observed treatment effects by magnitudes ranging from approximately 10 to 30 percent.

Type
Original Article
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press on behalf of the European Political Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ahler, DJ and Broockman, D (2018) The delegate paradox: why polarized politicians can represent citizens best. Journal of Politics 80, 11171133.CrossRefGoogle Scholar
Ahler, DJ and Goggin, SN (2019) How does one recognize #FakeNews? Assessing competing explanations using a conjoint experiment. In Annual Meeting of the Midwest Political Science Association. Chicago: Midwest Political Science Association.Google Scholar
Akerlof, GA (1970) The market for “lemons”: quality yncertainty and the market mechanism. Quarterly Journal of Economics 84, 488500.CrossRefGoogle Scholar
Amazon Mechanical Turk (2019 a) MTurk worker quality and identity. Available at https://blog.mturk.com/mturk-worker-identity-and-task-quality-d3be46d83d0d.Google Scholar
Amazon Mechanical Turk (2019 b) Qualifications and worker task quality. Available at https://blog.mturk.com/qualifications-and-worker-task-quality-best-practices-886f1f4e03fc.Google Scholar
Aronow, PM, Kalla, J, Orr, L and Ternovski, J (2020) Evidence of rising rates of inattentiveness on Lucid in 2020. Preliminary memo: https://osf.io/preprints/socarxiv/8sbe4/.CrossRefGoogle Scholar
Bai, H (2018) Evidence that a large amount of low quality responses on MTurk can be detected with repeated GPS coordinates. Available at https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gps-are-random.Google Scholar
Bartels, LM (2002) Beyond the running tally: partisan bias in political perceptions. Political Behavior 24, 117150.CrossRefGoogle Scholar
Berinsky, AJ, Huber, GA and Lenz, GS (2012) Evaluating online labor markets for experimental research: Amazon.com's Mechanical Turk. Political Analysis 20, 351368.CrossRefGoogle Scholar
Bisgaard, M (2015) Bias will find a way: economic perceptions, attributions of blame, and partisan motivated reasoning during crisis. The Journal of Politics 77, 849860.CrossRefGoogle Scholar
Busby, EC (2020) Perceptions of extremism in the American public and elected officials. Unpublished manuscript.Google Scholar
Campbell, DT and Stanley, JC (1963) Experimental and Quasi-Experimental Designs for Research. Boston: Hought Mifflin Company.Google Scholar
Casler, K, Bickel, L and Hackett, E (2013) Separate but equal? A comparison of participants and data gathered via Amazon's MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior 29, 21562160.CrossRefGoogle Scholar
Chandler, J, Sisso, I and Shapiro, D (2020) Participant carelessness and fraud: consequences for clinical research and potential solutions. Journal of Abnormal Psychology 129, 4955.CrossRefGoogle ScholarPubMed
Coppock, A and McClellan, OA (2019) Validating the demographic, political, psychological, and experimental results obtained from a new source of online survey respondents. Research & Politics 6, 114.CrossRefGoogle Scholar
Cor, MK and Sood, G (2016) Guessing and forgetting: a latent class model for measuring learning. Political Analysis 24, 226242.Google Scholar
Cornell, D, Klein, J, Konold, T and Huang, F (2012) Effects of validity screening items on adolescent survey data. Psychological Assessment 24, 2135.CrossRefGoogle ScholarPubMed
Dreyfuss, E (2018) A bot panic hits Amazon's Mechanical Turk. Wired 17 August. Available at https://www.wired.com/story/amazon-mechanical-turk-bot-panic/.Google Scholar
Garz, M, Sood, G, Stone, DF and Wallace, J (2018) What drives demand for media slant? Unpublished manuscript. Available at https://papers.ssrn.com/sol3/papers.cfm?abstract˙id=3009791.Google Scholar
Goodman, JK, Cryer, CE and Cheema, A (2012) Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making 26, 213224.CrossRefGoogle Scholar
Graham, M (2020) When good citizens are good partisans: attributing responsibility for the COVID-19 pandemic. Unpublished manuscript.Google Scholar
Graham, M (2021) “We Don’t Know” Means “They’re Not Sure.” Forthcoming at Public Opinion Quarterly.CrossRefGoogle Scholar
Hauser, DJ and Schwarz, N (2016) Attentive turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behavior Research Methods 48, 400407.CrossRefGoogle ScholarPubMed
Hillygus, DS, Jackson, N and Young, M (2014) Professional respondents in non-probability online panels. In Callegaro M, Baker R, Bethlehem J, Göritz AS, Krosnick JA and Lavrakas PJ (eds), Online Panel Research: A Data Quality Perspective, New York: John Wiley & Sons, pp. 219–237.Google Scholar
Horton, JJ, Rand, DG and Zeckhauser, RJ (2011) The online laboratory: conducting experiments in a real labor maret. Experimental Economics 14, 399425.CrossRefGoogle Scholar
Institute of Governmental Studies at the University of California, Berkeley (2015) Omnibus Survey. https://www.igs.berkeley.edu/igs-poll/berkeley-igs-poll.Google Scholar
Kennedy, R, Clifford, S, Burleigh, T, Waggoner, P and Jewell, R (2018) How Venezuela's economic crisis is undermining social science research—about everything. Monkey Cage Blog 7 November. Available at https://www.washingtonpost.com/news/monkey-cage/wp/2018/11/07/how-the-venezuelan-economic-crisis-is-undermining-social-science-research-about-everything-not-just-venezuela/?utm˙term=.8945c0926825.Google Scholar
Kennedy, R, Clifford, S, Burleigh, T, Waggoner, PD, Jewell, R and Winter, N (2020) The shape of and solutions to the MTurk quality crisis. Political Science Research & Methods 8, 614629.CrossRefGoogle Scholar
Krosnick, J (1991) Response strategies for coping with the cognitive demands of attitude meaures in surveys. Applied Cognitive Psychology 5, 213236.CrossRefGoogle Scholar
Krosnick, JA, Narayan, S and Smith, WR (1996) Satisficing in surveys: initial evidence. New Directions for Evaluation 70, 2944.CrossRefGoogle Scholar
Laohaprapanon, S and Sood, G (2018) Know Your IP. Available at https://github.com/themains/know_your_ip.Google Scholar
Litman, L (2019) Best recruitment practices: working with issues of non-naivete on MTurk. Available at https://www.cloudresearch.com/resources/blog/best-recruitment-practices-working-with-issues-of-non-naivete-on-mturk/.Google Scholar
Lopez, J and Hillygus, DS (2018) Why so serious? Survey trolls and misinformation. In Annual Meeting of the Midwest Political Science Association. Chicago. Unpublished manuscript.CrossRefGoogle Scholar
MaxMind, LLC (2006) GeoIP. Available at https://www.maxmind.com/en/home.Google Scholar
Mitchell, RE (2005) How many deaf people are there in the United States? Estimates from the survey of income and program participation. Journal of Deaf Studies and Deaf Education 11, 112119.CrossRefGoogle ScholarPubMed
Mullinix, KJ, Leeper, TJ, Druckman, JN and Freese, J (2015) The generalizability of survey experiments. Journal of Experimental Political Science 2, 109138.CrossRefGoogle Scholar
Mummolo, J and Peterson, E (2019) Demand effects in survey experiments: an empirical assessment. American Political Science Review 113, 517529.CrossRefGoogle Scholar
National Gang Intelligence Center (U.S.) (2012) 2011 National Gang Threat Assessment: Emerging Trends. New York, NY: National Gang Intelligence Center.Google Scholar
Paolacci, G, Chandler, J and Ipeirotis, PG (2010) Running experiments on Amazon Mechanical Turk. Judgment and Decision Making 5, 411419.Google Scholar
Paolacco, G and Chandler, J (2014) Inside the Turk: understanding Mechanical Turk as a participant pool. Current Directions in Psychological Science 23, 184188.CrossRefGoogle Scholar
Peer, E, Vosgerau, J and Acquisti, A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods 46, 10231031.CrossRefGoogle ScholarPubMed
Pontin, J (2007) Artificial intelligence, with help from the humans. The New York Times 25 March. Available at https://www.nytimes.com/2007/03/25/business/yourmoney/25Stream.html.Google Scholar
Robinson-Cimpian, JP (2014) Inaccurate estimation of disparities due to mischevious responders: several Suggestions to assess conclusions. Educational Researcher 43, 171185.CrossRefGoogle Scholar
Roush, CE and Sood, G (2020) A gap in our understanding? Reconsidering the evidence for partisan knowledge gaps. Unpublished manuscript. Available at https://www.gsood.com/research/papers/partisan˙gap.pdf.Google Scholar
Ryan, TJ (2018) Data contamination on MTurk. Available at http://timryan.web.unc.edu/2018/08/12/data-contamination-on-mturk/.Google Scholar
Savin-Williams, RC and Joyner, K (2014) The dubious assessment of gay, lesbian, and bisexual adolescents of add health. Archives of Sexual Behavior 43, 413422.CrossRefGoogle ScholarPubMed
Sears, DO (1986) College sophomores in the laboratory: influences of a narrow data base on social psychology's view of human nature. Journal of Personality and Social Psychology 51, 515530.CrossRefGoogle Scholar
Shet, V (2014) Are you a robot? Introducing ‘No CAPTCHA reCAPTCHA”. Available at https://security.googleblog.com/2014/12/are-you-robot-introducing-no-captcha.html.Google Scholar
Thomas, KA and Clifford, S (2015) The generalizability of survey experiments. Computers in Human Behavior 77, 184197.CrossRefGoogle Scholar
Thomas, KA and Clifford, S (2017) Validity and Mechanical Turk: an assessment of exclusion methods and interactive experiments. Computers in Human Behavior 77, 184197.CrossRefGoogle Scholar
Thompson, AI and Busby, EC (2020) Different (race) cards in the deck: directness and denials in racial messaging. Unpublished manuscript.Google Scholar
Vannette, DL and Krosnick, JA (2014) A comparison of survey satisficing and mindlessness. In Ie A, Ngnoumen CT and Langer EJ (eds.), The Wiley Blackwell Handbook of Mindfulness. Malden: Wiley, pp. 312–327.CrossRefGoogle Scholar
Woon, J (2017) Political Lie detection. Unpublished manuscript. Available at https://rubenson.org/wp-content/uploads/2017/11/woon.pdf.Google Scholar
Zhang, C, Antoun, C, Yan, HY and Conrad, FG (2020) Professional respondents in opt-in online panels: what do we really know? Social Science Computer Review 38, 703719.CrossRefGoogle Scholar
Supplementary material: Link

Ahler et al. Dataset

Link
Supplementary material: PDF

Ahler et al. supplementary material

Ahler et al. supplementary material

Download Ahler et al. supplementary material(PDF)
PDF 1 MB