Knowledge Graph Infrastructure

Dr. Benjamin Zapilko

Wissenstechnologien für Sozialwissenschaften
Information Extraction & Linking
Teamleiter

+49 (221) 47694-515
E-Mail
vCard

The aim of the Knowledge Graph (KG) infrastructure is to build an infrastructure for GESIS-wide linking of social science research data and resources and their interoperability and findability on the Web. This is based on the development of a social science knowledge graph which links the GESIS data collections among themselves and these with established vocabularies, social science data sources and established knowledge bases on the web such as web sites like Wikidata.

The KG will also be enriched by extracted entities such as variables and links, for example, between publications and research data. Survey data as well as digital behavioral data are taken into account. The rich information in the Social Science Knowledge Graph will be integrated into GESIS services such as the GESIS-wide search to support users, e.g. during their search for research data.

Based on this, additional knowledge graphs will be provided and linked in the infrastructure that hold data, entities and their relationships relevant to social science research topics, such as ClaimsKG, a graph of annotated claims extracted from fact checking websites.

For the development of the knowledge graph infrastructure in general and the social science knowledge graph in particular, methods of information extraction, entity interlinking, coreference resolution and data fusion are being investigated and applied.

Projects and datasets in context of the Knowledge Graph infrastructure

  • GESIS-wide search: The Knowledge Graph infrastructure is integrated into the backend of the GESIS-wide search and thus provides users with structured information on linked research data, publications, etc.
  • GESIS Research Graph: In the GESIS Research Graph project, a graph has been developed prototypically that links publications, research data, projects and people. The GESIS Research Graph is based on the Knowledge Graph infrastructure and contains over 110,000 publications, over 6,200 research records, and over 53,000 research projects.
  • InFoLiS: In the project InFoLiS - Integration of Research Data and Literature a method has been investigated and developed which allows for detecting citations of research datasets in scientific publications. The resulting links between publications and research data are integrated into the Social Science Knowledge Graph.
  • EXCITE: In the EXCITE - Extraction of Citations from PDF Documents project, procedures were developed and developed to extract and structure literature citations from scientific publications. The extracted references (over 1 million) were delivered to the Open Citations Corpus (OCC). Of these, over 300,000 links to publications in GESIS data collections were identified, which will be integrated into the Social Science Knowledge Graph.
  • OpenMinTeD: In the OpenMinTeD project, methods have been investigated and developed to identify the mentions of variables in scientific publications. The generated 415 links between publications and variables will be integrated into the Social Science Knowledge Graph.
  • MOVING: In the project MOVING, methods were investigated and developed to disambiguate authors. The methods are used to disambiguate person names from various data sources in the Knowledge Graph infrastructure, as well as to identify and resolve duplicates in the records.
  • SoRa: In the project SoRa - Social Spatial Research Data Infrastructure a knowledge graph is under development that describes social science survey data at study, variable and question level. So far, the graph represents two complementary datasets of different institutes and will be extended by links to spatial data.
  • ClaimsKG: ClaimsKG is a knowledge graph that contains claims and their evaluation from fact checking websites and links relevant entities with concepts of DBpedia. The KG currently holds 28,383 claims from 6 English-language websites.
  • TweetsKB: TweetsKB is a knowledge graph hosted at the L3S research center that includes metadata about 1.5 billion tweets (Feb. 2013 - Mar. 2018) and serves as a resource for social science research. Using information extraction methods, sentiments, entities, hashtags, and user mentions were extracted and published as linked data through a structured RDF schema.