Text and Data Mining

Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.

Main research areas are:

  • Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
  • Combine digital behavioral data and survey data to create new types of user models
  • Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
  • Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
  • Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
  • Extraction of key information from texts and (semi-)automatic indexing
  • Zielinski, Andrea, and Peter Mutschke. 2018. "Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications." In Proceedings of LREC 2018
  • Zielinski, Andrea, and Peter Mutschke. 2017. "Mining Social Science Publications for Survey Variables." In Proceedings of the Second Workshop on Natural Language Processing and Computational Social Science, Vancouver, Canada, August 3, 2017, edited by Dirk Hovy, Svitlana Volkova, and David Bamman, 47–52. Association for Computational Linguistics. aclweb.org/anthology/W17-29. aclweb.org/anthology/W17-29.
  • Dulisch N., Mathiak B. (2017) Towards Finding Animal Replacement Methods. In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450. Springer, Cham
  • Schaible, Johann, Pedro Szekely, and Ansgar Scherp. 2016 (Forthcoming). "Comparing Vocabulary Term Recommendations using Association Rules and Learning To Rank: A User Study." In THE SEMANTIC WEB. LATEST ADVANCES AND NEW DOMAINS
  • Schaible, Johann, Thomas Gottron, and Ansgar Scherp. 2016 (Forthcoming). "TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud." In THE SEMANTIC WEB. LATEST ADVANCES AND NEW DOMAINS
  • Mathiak, B.; Boland K. (2015): Challenges in Matching Dataset Citation Strings to Datasets in Social Science. D-Lib Magazine 21 (1/2). doi.org/10.1045/january2015-mathiak
  • Ritze, D.; Boland, K. (2013): Integration of Research Data and Research Data Links into Library Catalogues. Proceedings of the International Conference on Dublin Core and Metadata Applications (DC 2013), 2013.
  • Boland, Katarina, Dominique Ritze, Kai Eckert, and Brigitte Mathiak. 2012. "Identifying References to Datasets in Publications." TPDL 2012 : Theory and Practice of Digital Libraries, Paphos.