The major challenge for long-term digital preservation is to preserve data, metadata and documents in such a way that both the readability of the files and the interpretability of their contents can be guaranteed. In this context, long-term archiving is subject to continuous technical change.
The work steps outlined in the other sections of these standards and workflows help to address change with standardized, platform-independent tools that have proven themselves in practice. GESIS has been facing this challenge since 1960.
The Open Archival Information System (OAIS, ISO standard 14721) provides for three packages of data and documents. GESIS follows this logic.
- For the long-term digital preservation and publication of their data, data depositors assemble an information package from several objects, the so-called Submission Information Package (SIP). The SIP includes in particular the survey or data collection instrument as well as a methodological description of the data collection. The SIP is used by GESIS for the creation or update of one or more so-called Archival Information Packages (AIPs) and the associated descriptive information. The AIP is created together with the data depositors.
- A prerequisite for the creation of an AIP is a data input check. The AIP then essentially consists of three parts:
- The data and information to be backed up.
- Information required for interpretation:
- origin (provenance) information,
- references such as a Digital Object Identifer (DOI),
- information to protect against unauthorized changes (e.g. checksums),
- contextual information,
- and access rights information.
- Prepared versions of the data and documents as well as corresponding programs for data definition at GESIS.
- These are, for example, (commented) syntax files for SPSS or do-files for Stata.
- These establish the reference between original and archive versions.
- A Dissemination Information Package (DIP) is created from the AIP. This is made available to users via various GESIS services.
All delivered data is stored in the archiving system according to defined rules. Data archiving takes place along the life cycle of research data.
The archive system is subject to strict access rights. Only authorized employees are allowed to transfer objects eligible for long-term preservation to the digital archive or remove them from the archive.
The archive system has a file-based directory structure. The files are stored in defined directories according to certain rules and named according to a uniform scheme. The originals, i.e. the files supplied by the data depositor, are stored and preserved in their original form with the original file names. This procedure expresses the technical and logical relationships of the objects to each other.
In addition to the organizational and technical measures described, GESIS uses a modern and powerful IT infrastructure assuring physical preservation. This enables rapid recovery in the event of a disaster, among other things. The IT infrastructure includes the physical protection of data ('bitstream preservation') through regular backups, the backup of data on different types of storage media, redundant storage at different locations, and media migration.
A digital long-term archive is trustworthy if it can ensure the preservation of data over a long period of time in accordance with its goals and tasks and if its users, producers, operators, and partners trust it to do so.
Proof of the trustworthiness of digital archives is usually provided by certification and self-audit procedures.