|
|
The processing of social science studies and datasets
Acquisition, processing and archiving - from raw data to prepared
studies
The provision of primary material and
results of empirical studies to the interested public is a driving force behind the development of the
GESIS-ZA
database. To make datasets usable for secondary analysis
wide-ranging checks and processing work are, as a rule, necessary. The
individual steps in the archiving of datasets follow the standards and
procedures set out below, in order to allow the distribution of high
quality data. They meet international archiving standards, which are
designed to take advantage of the latest technical developments and to
fulfil the new requirements of social sciences.
1. Data acquisition and depositors
- The starting point is the continuing acquisition of studies from empirical
social research, in some cases on the request of a user. The archive
attempts to ensure the highest differentiation of arrangements for release
of studies. Among the data depositors there are higher education
institutions as well as market and opinion research institutes, scientific
foundations and associations, both national and international.
- At the same time individual researchers can approach the archive
themselves for the archiving, preparation and sale of their data.
Assessment of the data beyond this does not, however, take place at the
Zentralarchiv.
- Detailed information on archiving can be found in the
Instructions
for depositors;
For further help and all question on archiving social science
datasets please contact
- Oliver Watteler, M.A., Tel. +49
(0)221- 47694-76, E-Mail
.2. Preliminary checks of data and documentation in the Zentralarchiv
The datasets are usually deposited in the archive on disks. In addition
to this computer-readable data, the archive requires access to the
necessary documents (questionnaire, coding frame) and supplementary
information on the study. These are assessed in the preliminary checks.
Tests of the primary documents
for completeness:
- Is the questionnaire at hand?
- Is the coding frame complete?
- Is the documentary material
sufficient for a methodological
description
of the study?
|
Technical check of the storage medium
and data test:
- Is the storage medium readable and virus-free?
- Has the correct dataset been deposited?
- Is the number of cases correct?
- Are there differences between questionnaire, coding frame and
data?
- Are there undefined and so-called "wild codes" or
duplicate cases?
- Comparison of the first counting of data with the
documents supplied by the data depositor
- Error correction
|
Any coding errors and logical inconsistencies are cleaned; in
addition, the counting is checked against the encoding document. The aim
of the preliminary checks is the creation of a dataset in which all cases
are complete as well as clearly identified and fully consistent
with the encoding.
3. Data preparatation to the Zentralarchiv standard
- In the next processing phase the data is prepared in a data format which
allows the analyst to use modern analysis software such as SAS or SPSS.
- According to the level of processing the studies are assigned to
preparation categories 1, 2 or 3.
- Wide-ranging archive standards have been developed for the formatting of
data and of the accompanying information (codebook study description),
which meet the requirements of modern data analysis systems and which are
harmonised with the international partner archives of the Zentralarchiv on a
regular basis.
Documented editing and testing of the data,
e.g.:
- are the filters correct or is the data inconsistent?
- "missing values" are set to standardised values,
which are the same for all studies
- variable descriptions are specified
- labels for variable names and value labels are defined and
written
- an SPSS setup is prepared, which allows the direct production
of system or portable files
- All editing and cleaning steps are documented
|
Integration into a cumulative dataset:
- Individual datasets with waves at several time points as well as
in different countries undergo an additional level of processing in
order to meet the growing demand for comparative data.
- In several intermediate steps the heterogeneous data structures of
the individual datasets are harmonised to a common standard and the
separate datasets integrated into a single dataset.
- This is carried
out, for example, for the ALLBUS, Eurobarometer, ISSP,
Politbarometer and the election studies.
|
4. Codebook production
Parallel to the technical data processing the preparation of the text for
selected studies is carried out:
- structured preparation of the questionnaire with complete question
text and all answer categories in Zentralarchiv standard format.
- combination of text and cleaned data in a codebook.
The final product of the study preparation routines (Preparation
category
I) is a computer-readable codebook, which is also available in printed
format. It includes complete documentation of the dataset which is
important for secondary analysis:
- context information for the survey (in some cases with an
additional methodological report)
- variable labels
- complete question and answer texts
- location of the variables in the dataset
- notes on filters and routing and on (country-) specific codes
- unweighted sample count with absolute and relative frequencies
for each variable
- additional notes, for example on coding of occupations (notes)
- variable list
5. Study description
From the content and methodological information a separate study
description is produced. This in particular serves the systematic
indexing of Zentralarchiv data holdings through:
- summary of contents of the questionnaire
- methodological-technical description of the study
- inclusion of the study description in the data catalogue
- announcements of new datasets
6. Archiving
The computer-readable data and original documents are saved:
- storage of the prepared dataset as a supplement to the original
dataset and the machine-readable codebook.
- preparation of a security backup copy and storage in a separate
location.
- preservation of the original documents (questionnaire and so on)
through the scanning of the text and its electronic storage.
7. Information retrieval
For locating existing studies or comparable topics at variable
level the study descriptions as well as the machine-readable
codebook are sometimes indexed and entered into a retrieval system.
Various GESIS-ZA search engines (Data
Holdings Catalogue,
ZACAT,
ZA
QBase) and it data collections are available online.
8. Supply of data
Ordering data and context information takes place within the framework of
the Usage and
charging
regulations and can be done by post or via the Internet. Further
information on the location and supply of data and documentation as well
as their acquisition are combined in a separate
overview.
© GESIS Oliver Watteler
14.12.2007
|