Data Services: Standards and Workflows

Preparation of Data Transfer

The publication of research data at GESIS must be prepared by data depositor. Legal, organizational and technical points need to be clarified. Data depositor must hold rights to the data, these must be sufficiently documented and should be available in secure formats. 

Copyright

As the creator of data and documents, you have rights to your research results. When archiving the results, so-called 'simple rights of use' are transferred to the data archive via a contract. The simple rights of use include, among other things, the right to pass on data and documents to third parties or the right to change the formats of digital objects (files, tables, etc.) for the purposes of long-term backup. For the transfer of rights of use, the question of authorship should be clarified. For example, no digitized books or third-party scientific publications may be archived. 

The contractual basis of a data offer can be found here.

Data Protection

When data collected from individuals are to be archived, data protection issues often need to be addressed. Regularly, data collection is based on the informed consent of the subjects (informed consent). However, there may be other legal conditions such as specific laws or contracts. 

Questions for data depositors include the following: 

  • In what legal framework was the data collected? 
  • Were the data anonymized? 
  • Under what conditions can the data be offered? 

When processing research data for archiving, the following is important: 

  • There are no legal obstacles to archiving, such as contractual regulations with a client or other legal restrictions. 
  • The data must not include direct identifiers such as names, addresses, telephone numbers, license plates, social security numbers, or similar. 
  • The data must be de facto anonymized (*). This means that a reference to a person from information contained in the data can only be restored with disproportionately high effort. 
  • For research data where anonymization at the data level presents difficulties, technical and organizational measures can be used to increase data protection. Examples are: 

(*) The concept of 'anonymization' has been revised with the General Data Protection Regulation (GDPR) of May 25, 2018 and the subsequently revised federal and state data protection acts. Legal bases are special, but not exclusive, Article 89 para.1 of the GDPR and Section 27 para. 3 sentence 1 Federal Data Protection Act (BDSG). In the context of the Federal Republic of Germany, GESIS assumes de facto anonymization of data before archiving. This means that a reference to a person can only be restored with disproportionate effort. GESIS thus follows the position of the German Data Forum (RatSWD) of July 16, 2018.

Data and Documents

The data depositor compiles a Submission Information Package (SIP), which should include the following components: 

  • Machine readable data set 
  • Method report (Handbook creation of method report (175 kB) - German)
  • Data collection instrument / measuring instrument (original or copy preferably in electronic form) 
  • Code plan for unprocessed data 
  • Form for or information about obtaining informed consent 

Materials in PDF format must be free of protection mechanisms, otherwise they cannot be edited, e.g. migrated to newer file formats. 

Recommended File Formats

With regard to the long-term preservation of interpretability and usability of the data, the choice of suitable file formats is particularly important. Just like hardware, software is subject to a constant development process. For example, programs are equipped with new functionality or adapted for new operating systems, which in both cases can be accompanied by a corresponding change in the file format. All digital data is thus permanently threatened by changes in the hardware and software environment. These risks can be reduced by choosing suitable formats, among other things. 

In principle, data sets should be transferred in such a way that they can be used with one of the widely used statistical packages (SPSS, Stata or R). There are various possibilities for this: 

  • Data can be transferred in the proprietary formats of the common statistical programs as so-called system files (e.g. SPSS System File). 
  • Data can be transferred in text-based (comma-, tab- or column-separated) formats together with corresponding setup or syntax files for reading into the respective statistical programs. 
  • Data in software-specific portable file formats are also accepted. 

Data type

Preferred formats

Data (statistics formats)

  • Stata (.dta)
  • R (.rds; .rda)
  • Widely used (proprietary) formats of statistical packages, e.g. SPSS (.sav), Stata (.dta) 
  • Tab-, comma- or column-separated text file ("csv") with additional setup file (setup, command or syntax file for SPSS, Stata, SAS etc.) with corresponding data definitions (variable names and labels, missing values etc.). Alternatively, the data definitions can also be transmitted as DDI-XML file. 

Documentation (texts)

  • PDF/A-1, A-2, A-4 (*.pdf)
  • Text formats (ASCII, ANSI, etc.)

Images

  • Baseline TIFF Version 6 uncompressed (*.tif)

Data type

Accepted formats

Data (statistics formats)

  • SAS Transport, SAS, SPSS Portable
  • OpenDocument-Tabellendokument (*.ods), MS Excel (*.xls, *.xlsx)
  • CSV formats without additional data definition files (Setup, Syntax, Command file) 
  • Column binary format (column binary is a standard to represent data as images of punched cards) or card image format. 

Documentation (texts)

  • OpenDocument Text (*.odt)
  • PDF (*.pdf)
  • MS Word (*.doc, .docx)
  • RichTextFormat (*.rtf)
  • WordPerfect (*.wpd, *.cwp, *.vwp)
  • HTML (*.htm)

Images

  • JPEG 2000
  • JPEG, PNG, GIF, BMP
  • PDF/A-1, A-2, A-4, PDF (*.pdf)