German Social Science Infrastructure Service
SearchSitemapHelp
GESIS Service Agency Eastern Europe GESIS-ZA  
Social Science Information Center Center for Survey Researchand Methodology

Literature & Research Information

Further Topics Data Service & Archiving

Consultation

Data Processing

Social Monitoring

Methods Consultation

Research & Development

Software

Publications

 

Order & Downloads

Events

GESIS Libraries

Link Collection SocioGuide

 

Cooperation

Consultation

Staff  & Addresses

Organization

 

 

 

The data processing of social science datasets

Acquisition, Processing and Archiving - from raw data to prepared studies

Provision to the interested public of primary material and the results of empirical studies is a driving force behind the development of the ZA database. In order that datasets can be used for secondary analysis, wide-ranging checks and processing work are, as a rule, necessary. The individual steps in the archiving of a dataset follow the standards and procedures set out below, in order to allow the distribution of high quality data. They meet international archival standards, which are designed to take advantage of the latest technical developments and to meet the new requirements of social science.

1. Data acquisition and depositors

  • The starting point is the continuing acquisition of studies from empirical social research, in some cases at the request of a user. The Archive attempts to ensure the highest differentiation of arrangements for release of studies. Among the data depositors are higher education institutions as well as market and opinion research institutes, scientific foundations and associations, both internal and international.
  • At the same time, individual researchers can approach the Archive themselves for the archiving, preparation and sale of their data.
    Assessment of the data beyond this does not, however, take place at the Zentralarchiv. Detailed information on archiving can be found in the Instructions for depositors

For all basic questions on archiving of social science datasets, contact is

  • Oliver Watteler, M.A., Tel. +49 (0)221- 47694-76, E-Mail

2. Preliminary checks of data and documentation in the Zentralarchiv

The datasets are usually deposited in the Archive on diskette. In addition to this computer-readable data, the Archive requires access to the necessary documents (questionnaire, coding frame) and supplementary information on the study. These are assessed in the preliminary checks.

Tests of the primary documents
for completeness:
  • Is the questionnaire present?
  • Is the coding frame complete ?
  • Is the documentary material
    sufficient for a methodological description
    of the study ?
technical checking of the storage medium
and testing of the data:
  • Is the storage medium readable and virus-free ?
  • Has the correct dataset been deposited ?
  • Is the number of cases correct ?
  • Are there differences between questionnaire, coding frame and data ?
  • Are there undefined and so-called "wild codes" or duplicate cases ?
  • Comparison of the first counting  of the data with the documents supplied by the data depositor.
  • Error correction.

Any coding errors and logical inconsistencies present are cleaned; in addition, the counting is checked against the encoding document. The aim of the preliminary checks is the creation of a dataset in which all cases are complete as well as meaningfully identified and are fully consistent with the encoding.

3. Data preparatation to the Zentralarchiv standard
In the next processing phase, the data are prepared in a data format which allows the analyst to use modern analysis software such as SAS or SPSS.
According to the level of processing, the studies are assigned to preparation categories 1, 2 or 3.
Wide-ranging archive standards have been developed for the formatting of the data and of the accompanying information (codebook study description), which meet the requirements of modern data analysis systems and which are agreed with the international partner archives of the Zentralarchiv on a regular basis.

Documented editing and testing of the data, for
  • do the filters agree or are the data inconsistent ?
  • "missing values" are set to standardised values, which are the same for all studies.
  • variable descriptions are specified.
  • labels for variable names and value labels are defined and written.
  • an SPSS-Setup is prepared, which allows the direct production of system files or portable files.
  • All editing and cleaning steps are documented.
Integration into a cumulative dataset:

Individual datasets with waves at several time points, as well as in different countries undergo an additional level of processing, in order to meet the growing demand for comparative data.
In several intermediate steps, the heterogeneous data structures of the individual datasets are harmonised to a common standard and the separate datasets integrated into a single dataset. This is carried out, for example, for the ALLBUS, Eurobarometer, ISSP, Politbarometer and the Election Studies.

4. Codebook Production
Parallel to the technical data processing, preparation of the text for selected studies is carried out:

  • structured preparation of the questionnaire with complete question text and all answer categories in Zentralarchiv standard format.
  • combination of text and cleaned data in a codebook.

The end product of the study preparation routines (Preparation Category I) is a computer-readable codebook, which is also available in printed format. It includes complete documentation of the dataset, which is important for secondary analysis:

  • context information for the survey ( in some case, with an additional methodological report)
  • variable labels
  • complete question and answer texts
  • location of the variables in the dataset
  • notes on filters and routing, and on (country) specific codes
  • unweighted data descriptives with absolute and relative frequencies for each variable
  • additional notes, for example on coding of occupations (notes)
  • variable list

5. Study description
From the content and methodological information, a separate study description is produced. This serves in particular the systematic exploitation of Zentralarchiv data holdings through:

  • summary of the contents of the questionnaire
  • methodological-technical description of the study
  • inclusion of the study description in the data catalogue
  • announcements of new datasets in the ZA-Information (Zentralarchiv Information)

6. Archiving
The computer-readable data and original documents are saved:

  • storage of the prepared dataset as a supplement to the original dataset and the machine-readable codebook.
  • preparation of a security backup copy and storage in a separate location.
  • preservation of the original documents (questionnaire and so on) through the scanning of the text and its electronic storage.

7. Information retrieval
For locating existing studies, or comparable topics, sometimes at variable level, the study descriptions, as well as the machine-readable codebook, are indexed and entered into a retrieval system. For internal searches, ISYS, for example, can be used. Online there are varying search engines on the Internet for searchs of the Zentralarchiv (Data Holdings Catalogue, Data Holdings Index) and it data collections (ALLBUS, EUROBAROMETER, ISSP)

8. Supply of data
Ordering data and context information takes place within the framework of the Usage-and charging regulations and can be done by post or via the Internet. Further information on the location and supply of data and documentation, as well as their interrelationships are combined in a separate overview.

© GESIS Erwin Rose 21.05.2007