Book contents
- Frontmatter
- dedication
- Contents
- List of figures and tables
- About the authors
- Foreword
- Introduction
- PART 1 MEMORY, PRIVACY AND TRANSPARENCY
- PART 2 THE PHYSICAL WORLD: OBJECTS, ART AND ARCHITECTURE
- PART 3 DATA AND PROGRAMMING
- 8 Preparing and releasing official statistical data
- 9 Sharing research data, data standards and improving opportunities for creating visualisations
- 10 Open source, version control and software sustainability
- Final thoughts
- Index
8 - Preparing and releasing official statistical data
from PART 3 - DATA AND PROGRAMMING
Published online by Cambridge University Press: 08 June 2019
- Frontmatter
- dedication
- Contents
- List of figures and tables
- About the authors
- Foreword
- Introduction
- PART 1 MEMORY, PRIVACY AND TRANSPARENCY
- PART 2 THE PHYSICAL WORLD: OBJECTS, ART AND ARCHITECTURE
- PART 3 DATA AND PROGRAMMING
- 8 Preparing and releasing official statistical data
- 9 Sharing research data, data standards and improving opportunities for creating visualisations
- 10 Open source, version control and software sustainability
- Final thoughts
- Index
Summary
Introduction
In this chapter, we provide an overview of the preparation needed to release statistical data to researchers and the public. This involves protecting the confidentiality of data subjects as well as maintaining high-quality data. Our focus here will be on statistical disclosure limitation (SDL) methods used by statistical agencies and data custodians of official data sources. We also refer to a large body of work in the computer science literature for protecting the privacy of data subjects defined as differential privacy. We distinguish between confidentiality – as described in the statistical literature which refers to guarantees to respondents of surveys and censuses not to divulge their personal information that is shared with the statistical agency – and privacy, which refers to every data subject's right not to share their personal information.
The aim of SDL is to prevent sensitive information about individual respondents from being disclosed. SDL is becoming increasingly important owing to growing demands for accessible data provided by statistical agencies. The statistical agency has a legal obligation to maintain the confidentiality of statistical entities and in many countries there are codes of practice that must be strictly adhered to. In addition, statistical agencies have a moral and ethical obligation towards respondents who participate in surveys and censuses through confidentiality guarantees presented prior to their participation. The key objective is to encourage public trust in official statistics pro - duction and hence ensure high response rates.
The information released by statistical agencies can be divided into two major forms of statistical data: tabular data and microdata. Whereas tables have been commonly released by statistical agencies for decades, microdata released to researchers is a relatively new phenomenon. Many statistical agencies have provisions for releasing microdata from social surveys for research purposes, usually under special licence agreements and through secure data archives. Microdata from business surveys which are partially collected by census and have very sensitive data are typically not released.
In order to preserve the confidentiality of respondents, statistical agencies must assess the disclosure risk in statistical data and, if required, choose appropriate SDL methods to apply to the data. Measuring disclosure risk involves assessing and evaluating numer - ically the risk of re-identifying statistical units. SDL methods perturb, modify or summarise the data in order to prevent re-identification by a potential attacker.
- Type
- Chapter
- Information
- Partners for PreservationAdvancing Digital Preservation through Cross-Community Collaboration, pp. 147 - 166Publisher: FacetPrint publication year: 2018