|
|
The data processing of social science datasets
Acquisition, Processing and Archiving - from raw data to prepared
studies
Provision to the interested public of primary material and the results
of empirical studies is a driving force behind the development of the ZA
database. In order that datasets can be used for secondary analysis,
wide-ranging checks and processing work are, as a rule, necessary. The
individual steps in the archiving of a dataset follow the standards and
procedures set out below, in order to allow the distribution of high
quality data. They meet international archival standards, which are
designed to take advantage of the latest technical developments and to
meet the new requirements of social science.
1. Data acquisition and depositors
- The starting point is the continuing acquisition of studies from empirical
social research, in some cases at the request of a user. The Archive
attempts to ensure the highest differentiation of arrangements for release
of studies. Among the data depositors are higher education
institutions as well as market and opinion research institutes, scientific
foundations and associations, both internal and international.
- At the same time, individual researchers can approach the Archive
themselves for the archiving, preparation and sale of their data.
Assessment of the data beyond this does not, however, take place at the
Zentralarchiv. Detailed information on archiving can be found in the
Instructions
for depositors
For all basic questions on archiving of social science datasets,
contact is
- Oliver Watteler, M.A., Tel. +49
(0)221- 47694-76, E-Mail
2. Preliminary checks of data and documentation in the Zentralarchiv
The datasets are usually deposited in the Archive on diskette. In
addition to this computer-readable data, the Archive requires access to
the necessary documents (questionnaire, coding frame) and supplementary
information on the study. These are assessed in the preliminary checks.
Tests of the primary documents
for completeness:
- Is the questionnaire present?
- Is the coding frame complete ?
- Is the documentary material
sufficient for a methodological description
of the study ?
|
technical checking of the storage medium
and testing of the data:
- Is the storage medium readable and virus-free ?
- Has the correct dataset been deposited ?
- Is the number of cases correct ?
- Are there differences between questionnaire, coding frame and
data ?
- Are there undefined and so-called "wild codes" or
duplicate cases ?
- Comparison of the first counting of the data with the
documents supplied by the data depositor.
- Error correction.
|
Any coding errors and logical inconsistencies present are cleaned; in
addition, the counting is checked against the encoding document. The aim
of the preliminary checks is the creation of a dataset in which all cases
are complete as well as meaningfully identified and are fully consistent
with the encoding.
3. Data preparatation to the Zentralarchiv standard
In the next processing phase, the data are prepared in a data format which
allows the analyst to use modern analysis software such as SAS or SPSS.
According to the level of processing, the studies are assigned to
preparation categories 1, 2 or 3.
Wide-ranging archive standards have been developed for the formatting of
the data and of the accompanying information (codebook study description),
which meet the requirements of modern data analysis systems and which are
agreed with the international partner archives of the Zentralarchiv on a
regular basis.
Documented editing and testing of the data, for
- do the filters agree or are the data inconsistent ?
- "missing values" are set to standardised values,
which are the same for all studies.
- variable descriptions are specified.
- labels for variable names and value labels are defined and
written.
- an SPSS-Setup is prepared, which allows the direct production
of system files or portable files.
- All editing and cleaning steps are documented.
|
Integration into a cumulative dataset:
Individual datasets with waves at several time points, as well as
in different countries undergo an additional level of processing, in
order to meet the growing demand for comparative data.
In several intermediate steps, the heterogeneous data structures of
the individual datasets are harmonised to a common standard and the
separate datasets integrated into a single dataset. This is carried
out, for example, for the ALLBUS, Eurobarometer, ISSP,
Politbarometer and the Election Studies. |
4. Codebook Production
Parallel to the technical data processing, preparation of the text for
selected studies is carried out:
- structured preparation of the questionnaire with complete question
text and all answer categories in Zentralarchiv standard format.
- combination of text and cleaned data in a codebook.
The end product of the study preparation routines (Preparation Category
I) is a computer-readable codebook, which is also available in printed
format. It includes complete documentation of the dataset, which is
important for secondary analysis:
- context information for the survey ( in some case, with an
additional methodological report)
- variable labels
- complete question and answer texts
- location of the variables in the dataset
- notes on filters and routing, and on (country) specific codes
- unweighted data descriptives with absolute and relative frequencies
for each variable
- additional notes, for example on coding of occupations (notes)
- variable list
5. Study description
From the content and methodological information, a separate study
description is produced. This serves in particular the systematic
exploitation of Zentralarchiv data holdings through:
- summary of the contents of the questionnaire
- methodological-technical description of the study
- inclusion of the study description in the data catalogue
- announcements of new datasets in the ZA-Information (Zentralarchiv
Information)
6. Archiving
The computer-readable data and original documents are saved:
- storage of the prepared dataset as a supplement to the original
dataset and the machine-readable codebook.
- preparation of a security backup copy and storage in a separate
location.
- preservation of the original documents (questionnaire and so on)
through the scanning of the text and its electronic storage.
7. Information retrieval
For locating existing studies, or comparable topics, sometimes at variable
level, the study descriptions, as well as the machine-readable codebook,
are indexed and entered into a retrieval system. For internal searches,
ISYS, for example, can be used. Online there are varying search engines on
the Internet for searchs of the Zentralarchiv (Data
Holdings Catalogue, Data
Holdings Index) and it data collections (ALLBUS,
EUROBAROMETER,
ISSP)
8. Supply of data
Ordering data and context information takes place within the framework of
the Usage-and charging
regulations and can be done by post or via the Internet. Further
information on the location and supply of data and documentation, as well
as their interrelationships are combined in a separate overview.
© GESIS Erwin Rose
21.05.2007
|