Preparing Data for Submission

Before data are submitted an archive agreement (88 kB) must be signed by archive and the depositor. This agreement covers arrangements regarding usage rights, authenticity, data protection responsibilities, and disposal. The depositor also selects an access category (Usage Regulations) defining conditions under which data and documentation are released.

Data can be submitted when outstanding legal questions have been answered and it’s established the depositor has authority to transfer non-exclusive rights for archiving and dissemination to the archive.

The depositor compiles a Submission Information Package (SIP) which should include the following:

  • Machine-readable dataset
  • Codebook for raw data
  • Survey instrument(s)/measurement instrument(s): originals or copies, in a digital format if possible
  • Methodology report (Erstellung von Methodenberichten für die Archivierung von Forschungsdaten)

PDF files must be free of protection otherwise they cannot be processed, e.g. for the migration into other data formats.

Recommended formats

To support the long-term preservation, interpretability, and accessibility of data, choosing suitable file formats is of particular importance. Just like hardware, software constantly evolves. For example, new functions are added to software programs, or software is adapted to new operating systems. Both can lead to changes in the file format. In consequence, digital data is constantly at risk from changes in the hard- and software environment. This risk can be mitigated if suitable file formats are used.

The GESIS Data Archive recommends using the following formats for the most important object classes:

Submitted datasets should be usable in one of the widely used statistical packages (SPSS, Stata or SAS). More specifically, data can be submitted in the following forms:

1. As so-called system files in the proprietary formats of common statistical packages (e.g. SPSS System File).

2. In software-specific portable file formats (e.g. SAS Transport File).

3. As text files (comma-, tab-delimited formats) with the required setup or syntax files to enable importing into statistical packages.  

 

Type of dataPreferred formatsAcceptable formats
Dataset (statistical file formats)
  • SPSS Portable (*.por)
  • STATA (*.dta)
  • SAS Transport (*.sas)
  • Widely used (proprietary) formats of statistical packages, e.g. SPSS (*.sav), Stata (*.dta), SAS (*.sas7bdat)
  • Tab-, comma-, delimited text files (“csv”) with setup file (setup, command or syntax file for SPSS, Stata, SAS, etc.) and the respective data definitions (variable names and labels, missing values, etc.). Alternatively, data definitions can be submitted as DDI-XML file.
  • OpenDocument table format (*.ods) , MS Excel (*.xls, *.xlsx), MS Access (*.mdb, *.accdb)
  • CSV formats without data definition files (setup, syntax, command file)
  • Column binary format (a standard making it possible to represent data as images of punch cards) or card image format.
Documentation (texts)
  • PDF/A (*.pdf)
  • Text formats (ASCII, ANSI, etc.)
  • OpenDocument text (*.odt)
  • PDF (*.pdf)
  • MS Word (*.doc, .docx)
  • RichText (*.rtf)
  • WordPerfect (*.wpd, *.cwp, *.vwp)
  • HTML (*.htm)
Images
  • TIFF Version 6 uncompressed (*.tif)
  • JPEG 2000
  • JPEG, PNG, GIF, BMP
  • PDF/A, PDF (*.pdf)

The Data Archive will accept additional formats, especially for data, which (upon consultation) can be converted into preferred formats for preservation. Regardless of the specific file format datasets should always be structured in a manner allowing third parties to read and understand them.

Data may therefore not be encrypted. In addition, functions such as printing or copying should not be disabled.

Sending Data to the Archive

The Submission Information Package (SIP) can be submitted to the archive as follows:

GESIS - Leibniz-Institut für Sozialwissenschaften
Datenarchiv für Sozialwissenschaften
Oliver Watteler
Unter Sachsenhausen 6-8
50667 Köln

  • After consultation: upload to the data archive’s server (Cryptshare)
  • By email
  • Download server of the depositor