Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-p2v8j Total loading time: 0 Render date: 2024-06-08T03:20:40.849Z Has data issue: false hasContentIssue false

Part III - Statistical Framework

Published online by Cambridge University Press:  05 July 2014

Julia Lane
Affiliation:
American Institutes for Research, Washington DC
Victoria Stodden
Affiliation:
Columbia University, New York
Stefan Bender
Affiliation:
Institute for Employment Research of the German Federal Employment Agency
Helen Nissenbaum
Affiliation:
New York University
Get access

Summary

Statistical Framework

If big data are to be used for the public good, the inference that is drawn from them must be valid for different, targeted populations. For that to occur, statisticians have to access the data so that they may understand the data-generating process, know whether the assumptions of their statistical model are met, and see what relevant information is included or excluded. It is clear from earlier chapters in this book that the utility of big data lies in being able to study small groups in real time, using new data analytic techniques, such as machine learning or data mining. These demands pose real challenges for anonymization and statistical analysis. The essays in this part of the book identify the issues, spell out the statistical framework for both analysis and data release, and outline key directions for future research.

A major theme of the essays is that neither the data-generating process nor the data collection process is well understood for big data. As Kreuter and Peng argue, almost all statistical experience with human subjects is based on survey data, and over time statisticians have parsed the sources of error neatly into a total survey error framework. But the data-generating process of many data streams – such as administrative data or big data – is less transparent and is not under the control of the researcher; therefore, access to the data itself is critical to building the necessary understanding. Continuous effort will be needed to develop standards of transparency in the collection of big data. Transparency is also needed on the ‘back end’ – any linkage, data preparation and processing, analysis, and reporting – to ensure reproducibility. Kreuter and Peng point out that much more research is needed on linkage and matching, because the resulting knowledge will not only enrich possible analysis, but also help to evaluate the quality of the linked sources.

Type
Chapter
Information
Privacy, Big Data, and the Public Good
Frameworks for Engagement
, pp. 253 - 256
Publisher: Cambridge University Press
Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×