Book contents
4 - Data Collection and File Creation Phase
Published online by Cambridge University Press: 28 January 2021
Summary
According to the Data Seal of Approval, there is a complementary relationship between the data producer's responsibility for the quality of his/her research data and the capability of the data archive to provide access and preservation for the long term (DANS, 2009). Following best practice in terms of building both the data and documentation components of a collection is critical. This section describes aspects of best practices in creating research data that conform to widely accepted norms for quantitative, GIS, qualitative, and other types of data in the social sciences.
Quantitative Data
Dataset creation and integrity
Transcribing data from a questionnaire or interview schedule to an actual data record can introduce several types of errors, including typing errors, codes that do not make sense, and records that do not match. For this reason, employing a data collection strategy that captures data directly during the interview process is recommended. Consistency checks can then be integrated into the data-collection process through the use of CATI/CAPI software in order to correct problems during an interview.
However, even if data are being transcribed (either from survey forms or published tables), several steps can be taken in advance to lessen the incidence of errors.
• Separate the coding and data-entry tasks as much as possible. Coding should be performed in such a way that distractions to coding tasks are minimized.
• Arrange to have particularly complex tasks, such as occupation coding, carried out by one person or by a team of persons specially trained for the task.
• Use a data-entry program that is designed to catch typing errors, i.e., one that is pre- programmed to detect out-of-range values.
• Perform double entry of the data, in which each record is keyed in and then rekeyed against the original. Several standard packages offer this feature. In the re-entry process, the program catches discrepancies immediately.
• Carefully check the first 5 to 10 percent of the data records created, and then choose random records for quality-control checks throughout the process.
• Let the computer do complex coding and recoding if possible. For example, to create a series of variables describing family structure, write computer code to perform the task. Not only are the computer codes accurate if the instructions are accurate, but they can also be easily changed to correct a logical or programming error.
- Type
- Chapter
- Information
- Preparing Data for SharingGuide to Social Science Data Archiving, pp. 15 - 30Publisher: Amsterdam University PressPrint publication year: 2012