German Social Science Infrastructure Service
SearchSitemapHelp
GESIS Service Agency Eastern Europe GESIS-ZA  
Social Science Information Center Center for Survey Researchand Methodology

Literature & Research Information

Further Topics Data Service & Archiving

Studies - Archiving & Consultation

Processing Studies & Datasets

Social Monitoring

Methods Consultation

Research & Development

Software

Publications

 

Order & Downloads

Events

GESIS Libraries

Link Collection SocioGuide

 

Cooperation

Consultation

Staff  & Addresses

Organization

 

 

 

The processing of social science studies and datasets

Acquisition, processing and archiving - from raw data to prepared studies

The provision of primary material and results of empirical studies to the interested public is a driving force behind the development of the GESIS-ZA database. To make datasets usable for secondary analysis wide-ranging checks and processing work are, as a rule, necessary. The individual steps in the archiving of datasets follow the standards and procedures set out below, in order to allow the distribution of high quality data. They meet international archiving standards, which are designed to take advantage of the latest technical developments and to fulfil the new requirements of social sciences.

1. Data acquisition and depositors

  • The starting point is the continuing acquisition of studies from empirical social research, in some cases on the request of a user. The archive attempts to ensure the highest differentiation of arrangements for release of studies. Among the data depositors there are higher education institutions as well as market and opinion research institutes, scientific foundations and associations, both national and international.
  • At the same time individual researchers can approach the archive themselves for the archiving, preparation and sale of their data. Assessment of the data beyond this does not, however, take place at the Zentralarchiv.
  • Detailed information on archiving can be found in the Instructions for depositors;

For further help and all question on archiving social science datasets please contact

  • Oliver Watteler, M.A., Tel. +49 (0)221- 47694-76, E-Mail

.2. Preliminary checks of data and documentation in the Zentralarchiv

The datasets are usually deposited in the archive on disks. In addition to this computer-readable data, the archive requires access to the necessary documents (questionnaire, coding frame) and supplementary information on the study. These are assessed in the preliminary checks.

Tests of the primary documents for completeness:
  • Is the questionnaire at hand?
  • Is the coding frame complete?
  • Is the documentary material
    sufficient for a methodological
    description of the study?
Technical check of the storage medium and data test:
  • Is the storage medium readable and virus-free?
  • Has the correct dataset been deposited?
  • Is the number of cases correct?
  • Are there differences between questionnaire, coding frame and data?
  • Are there undefined and so-called "wild codes" or duplicate cases?
  • Comparison of the first counting of data with the documents supplied by the data depositor
  • Error correction

Any coding errors and logical inconsistencies are cleaned; in addition, the counting is checked against the encoding document. The aim of the preliminary checks is the creation of a dataset in which all cases are complete as well as clearly identified and fully consistent with the encoding.

3. Data preparatation to the Zentralarchiv standard

  • In the next processing phase the data is prepared in a data format which allows the analyst to use modern analysis software such as SAS or SPSS.
  • According to the level of processing the studies are assigned to preparation categories 1, 2 or 3.
  • Wide-ranging archive standards have been developed for the formatting of data and of the accompanying information (codebook study description), which meet the requirements of modern data analysis systems and which are harmonised with the international partner archives of the Zentralarchiv on a regular basis.
Documented editing and testing of the data, e.g.:
  • are the filters correct or is the data inconsistent?
  • "missing values" are set to standardised values, which are the same for all studies
  • variable descriptions are specified
  • labels for variable names and value labels are defined and written
  • an SPSS setup is prepared, which allows the direct production of system or portable files
  • All editing and cleaning steps are documented
Integration into a cumulative dataset:
  • Individual datasets with waves at several time points as well as in different countries undergo an additional level of processing in order to meet the growing demand for comparative data.
  • In several intermediate steps the heterogeneous data structures of the individual datasets are harmonised to a common standard and the separate datasets integrated into a single dataset.
  • This is carried out, for example, for the ALLBUS, Eurobarometer, ISSP, Politbarometer and the election studies.

4. Codebook production

Parallel to the technical data processing the preparation of the text for selected studies is carried out:

  • structured preparation of the questionnaire with complete question text and all answer categories in Zentralarchiv standard format.
  • combination of text and cleaned data in a codebook.

The final product of the study preparation routines (Preparation category I) is a computer-readable codebook, which is also available in printed format. It includes complete documentation of the dataset which is important for secondary analysis:

  • context information for the survey (in some cases with an additional methodological report)
  • variable labels
  • complete question and answer texts
  • location of the variables in the dataset
  • notes on filters and routing and on (country-) specific codes
  • unweighted sample count with absolute and relative frequencies for each variable
  • additional notes, for example on coding of occupations (notes)
  • variable list

5. Study description

From the content and methodological information a separate study description is produced. This in particular serves the systematic indexing of Zentralarchiv data holdings through:

  • summary of contents of the questionnaire
  • methodological-technical description of the study
  • inclusion of the study description in the data catalogue
  • announcements of new datasets

6. Archiving

The computer-readable data and original documents are saved:

  • storage of the prepared dataset as a supplement to the original dataset and the machine-readable codebook.
  • preparation of a security backup copy and storage in a separate location.
  • preservation of the original documents (questionnaire and so on) through the scanning of the text and its electronic storage.

7. Information retrieval

For locating existing studies or comparable topics at variable level the study descriptions as well as the machine-readable codebook are sometimes indexed and entered into a retrieval system. Various GESIS-ZA search engines (Data Holdings Catalogue, ZACAT, ZA QBase) and it data collections are available online.

8. Supply of data

Ordering data and context information takes place within the framework of the Usage and charging regulations and can be done by post or via the Internet. Further information on the location and supply of data and documentation as well as their acquisition are combined in a separate overview.

© GESIS Oliver Watteler 14.12.2007