Text and Data Mining

Dr. Benjamin Zapilko

Wissenstechnologien für Sozialwissenschaften Data Linking

+49 (221) 47694-515

Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.

Main research areas are:

  • Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
  • Combine digital behavioral data and survey data to create new types of user models
  • Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
  • Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
  • Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
  • Extraction of key information from texts and (semi-)automatic indexing


  • Schaible, Johann, Pedro Szekely, and Ansgar Scherp. 2016 (Forthcoming). "Comparing Vocabulary Term Recommendations using Association Rules and Learning To Rank: A User Study." In THE SEMANTIC WEB. LATEST ADVANCES AND NEW DOMAINS
  • Schaible, Johann, Thomas Gottron, and Ansgar Scherp. 2016 (Forthcoming). "TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud." In THE SEMANTIC WEB. LATEST ADVANCES AND NEW DOMAINS
  • Zapilko, Benjamin, Johann Schaible, Timo Wandhöfer, and Peter Mutschke. 2015. "Applying linked data technologies in the social sciences." Künstliche Intelligenz : KI online first 1-4. doi: http://dx.doi.org/10.1007/s13218-015-0416-6. http://link.springer.com/article/10.1007/s13218-015-0416-6?wt_mc=internal.event.1.SEM.ArticleAuthorOnlineFirst.
  • Zapilko, Benjamin, and Brigitte Mathiak. 2014. "Object property matching utilizing the overlap between imported ontologies." In The Semantic Web: Trends and Challenges ; 11th International Conference, ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014 ; Proceedings, edited by Valentina Presutti, Claudia d'Amato, and Fabien Gandon, Lecture Notes in Computer Science ; vol. 8465, 737-751. Cham: Springer. http://2014.eswc-conferences.org/sites/default/files/papers/paper_65.pdf.
  • Boland, Katarina, Dominique Ritze, Kai Eckert, and Brigitte Mathiak. 2012. "Identifying References to Datasets in Publications." TPDL 2012 : Theory and Practice of Digital Libraries, Paphos.