Linked Open Research Data for Social Science Pilot Study (LORDpilot_SI)
Abstract
The re-use of research data is an integral part of research practice in the social and economic sciences. To find relevant data, researchers need adequate search facilities. However, a comprehensive, thematic search for research data is made more difficult by inconsistent or missing semantic indexing of data at the level of social science concepts because each survey programme uses its own terminology to describe its data. Consequently, researchers have to make great efforts to find relevant or comparable data. From the user's perspective, the fragmentation in data documentation impedes effective data searches and thus significantly limits the research potential of existing data collections. Because there is currently no semantic model for indexing the data content, the specific challenge for improving data search lies in establishing a concept-based indexing of research data. Research infrastructures need a technology for the harmonized semantic indexing of their research data. The LORD infrastructure aims at closing this gap by developing a registry of sociological and economic concepts and, following the FAIR principles, making this concept registry generally available for the scientific community. The ‘LORDpilot’ project submitted here evaluates the practicability of a Concept Registry for the social and economic sciences. As a first step, the project will develop a basic data model for the Concept Registry. Based on important data collections (ALLBUS, SOEP, Nacaps), social and economic science concepts will be identified and interlinked with each other as well as with the questions and variables in the data collections. For the technical implementation, we use standards of the Semantic Web. Linking the concepts with descriptors from the SKOS-compliant thesauri "Thesaurus Social Sciences" (TheSoz) and "Standard Thesaurus Economics" (STW) a) enhances the effectivity of searches in the concept database and b) directly integrates the concept vocabulary into the Linked Open Data (LOD) cloud. The modelling language UML (United Modeling Language) is used to create the data model, and the links between concepts are created and managed in the form of so-called RDF triples. To define relevant concepts based on theoretical constructs used in the data, intellectual analysis and comparison as well as research on measurement instruments and data related publications will be employed. The initial focus is on variables and questions with overlapping content in the three survey programmes, as they form a sound basis for the cross-linking with concepts.