Hostname: page-component-78c5997874-j824f Total loading time: 0 Render date: 2024-11-17T18:22:07.479Z Has data issue: false hasContentIssue false

Selection Bias Encountered in the Systematic Linking of Historical Census Records

Published online by Cambridge University Press:  06 July 2020

Luiza Antonie
Affiliation:
School of Computer Science, University of Guelph
Kris Inwood*
Affiliation:
Department of Economics and Finance, University of Guelph Department of History, University of Guelph
Chris Minns
Affiliation:
Department of Economic History, London School of Economics and Political Science
Fraser Summerfield
Affiliation:
Department of Economics, St Francis Xavier University

Abstract

Linked historical records typically are unrepresentative of the population from which they are drawn even if the method of linking is restricted to time-invariant matching criteria. An example drawn from Canadian census records illustrates the nature of bias that may afflict even a carefully linked sample. The use of potentially time-varying match criteria doubles the size of a linked sample at a modest cost in terms of additional bias. This trade-off is attractive for some research purposes if care is taken in the uses to which the data are put. Reweighting to mitigate the effects of bias in visible characteristics is desirable.

Type
Special Issue Article
Copyright
© The Author(s), 2020. Published by Cambridge University Press on behalf of the Social Science History Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Antonie, Luiza, Inwood, Kris, and Andrew Ross, J. (2015) “Dancing with dirty data: Problems in the extraction of life-course evidence from historical censuses,” in Bloothooft, Gerrit, Christen, Peter, Mandemakers, Kees, and Schraagen, Marijn (eds.) Population Reconstruction. Cham, Switzerland: Springer International Publishing AG: 217–41.CrossRefGoogle Scholar
Antonie, Luiza, Inwood, Kris, Lizotte, Dan, and Andrew Ross, J. (2014) “Tracking people over time in 19th century Canada.Machine Learning 96 (S1): 129–46.CrossRefGoogle Scholar
Antonie, Luiza, Inwood, Kris, Minns, Chris, and Summerfield, Fraser (2020) “When did the American dream move to Canada? Intergenerational mobility and the geography of opportunity, 1871–1901.” Presentation to the Nuffield Historical Mobility Conference, Oxford, January 31.Google Scholar
Bailey, Martha, Cole, Connor, and Massey, Catherine (2020a) “Simple strategies for improving inference with linked data: A case study of the 1850–1930 IPUMS linked representative historical samples.” Historical Methods, doi: 10.1080/01615440.2019.1630343.CrossRefGoogle Scholar
Bailey, Martha, Cole, Connor, Henderson, Morgan, and Massey, Catherine (2020b) “How well do automated methods linking perform? Evidence from the LIFE-M Project.” Journal of Economic Literature: forthcoming.Google Scholar
Bloothooft, Gerrit, Christen, Peter, Mandemakers, Kees, and Schraagen, Marijn, eds. (2015) Population Reconstruction. Cham, Switzerland: Springer International Publishing AG.CrossRefGoogle Scholar
Christen, Peter (2012) Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Cham, Switzerland: Springer International Publishing AG.CrossRefGoogle Scholar
Dillon, Lisa (1997) “Integrating nineteenth-century Canadian and American census data sets.” Computers and the Humanities (30): 381–92.CrossRefGoogle Scholar
Dillon, Lisa (2000) “Integrating Canadian and U.S. historical census microdata: Canada (1871 and 1901) and the United States (1870 and 1900).Historical Methods 33 (1): 85194.CrossRefGoogle ScholarPubMed
Feigenbaum, James (2016) “Automated census record linking: A machine learning approach.” Working paper. http://scholar.harvard.edu/files/jfeigenbaum/files/feigenbaum-censuslink.pdf (accessed November 1, 2018).Google Scholar
Feigenbaum, James (2018) “Multiple measures of historical intergenerational mobility: Iowa 1915 to 1940.” Economic Journal (128): F446F481.CrossRefGoogle Scholar
Fellegi, Ivan P., and Sunter, A. B. (1969) “A theory for record linkage.” Journal of the American Statistical Association (64): 11831210.CrossRefGoogle Scholar
Ferrie, Joseph P. (1996) “A new sample of males linked from the Public Use Micro Sample of the 1850 U.S. federal census of population to the 1860 U.S. federal census manuscript schedules.Historical Methods (29): 141–56.CrossRefGoogle Scholar
Fourie, Johan (2016) “The data revolution in African history.Journal of Interdisciplinary History (47): 192212.Google Scholar
Fu, Zhichun, Christen, Peter, and Boot, Max (2011) “Automatic cleaning and linking of historical census data using household information.” ICDM Workshops: 413–20.CrossRefGoogle Scholar
Fu, Zhichun, Boot, Mac, Christen, Peter, and Zhou, Jun (2014) “Automatic record linkage of individuals and households in historical census data.International Journal of Humanities and Arts Computing 8 (2): 204–25.CrossRefGoogle Scholar
Goeken, Ron, Huynh, Lap, Lenius, Thomas, and Vick, Rebecca (2011) “New methods of census record linking.Historical Methods (44): 714.CrossRefGoogle ScholarPubMed
Gutmann, Myron, Merchant, Emily Klancher, and Roberts, Evan (2018) “‘Big data’ in economic history.Journal of Economic History (78): 268–99.CrossRefGoogle Scholar
Hacker, J. David (2013) “New estimates of census coverage in the United States, 1850–1930.Social Science History 37 (1): 71101.Google Scholar
Maxwell-Stewart, Hamish (2016) “Big data and Australian history.Australian Historical Studies (47): 359–64.CrossRefGoogle Scholar
Richards, Laura (2013) “Disambiguating multiple links.” MSc thesis, University of Guelph.Google Scholar
Ruggles, Steven (2006) “Linking historical censuses: A new approach.History and Computing 14 (1–2): 213–24.CrossRefGoogle Scholar
Ruggles, Steven, Fitch, Cathy, and Roberts, Evan (2018) “Historical record linkage.Annual Review of Sociology (44): 1937.CrossRefGoogle ScholarPubMed
Thorvaldsen, Gunnar (2017) Censuses and Census Takers: A Global History. London: Routledge.CrossRefGoogle Scholar
Winkler, William E. (2006) “Overview of record linkage and current research directions.” United States Census Bureau Research Report Series: Statistics #2006-2. Washington, DC: US Census Bureau.Google Scholar
Ziliak, Stephen T., and McCloskey, Deirdre N. (2004) “Size matters.Journal of Socioeconomics (33): 527–46.Google Scholar