GESIS Leibniz Institute for the Social Sciences: Go to homepage

Text and Data Mining

Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.

Main research areas are:

  • Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
  • Combine digital behavioral data and survey data to create new types of user models
  • Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
  • Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
  • Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
  • Extraction of key information from texts and (semi-)automatic indexing
  • Kohne, Julian, Jon Elhai, and Christian Montag. 2022 (Forthcoming). "A Practical Guide to WhatsApp Data in Social Science Research." In Digital Phenotyping and Mobile Sensing, edited by Harald Baumeister, and Christian Montag, 171 - 205. Cham: Springer. doi: https://doi.org/10.1007/978-3-030-98546-2_11.
  • Dimitrov, Dimitar, Dennis Segeth, and Stefan Dietze. 2022. "Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information." In Companion Proceedings of WWW '22: The ACM Web Conference 2022 Virtual Event, Lyon France April 25 - 29, 2022, edited by Frédérique Laforest, Raphaël Troncy, Lionel Médini, and Ivan Herman, 438-442. New York: ACM. doi: https://doi.org/10.1145/3487553.3524623.
  • Schoch, David, Franziska B Keller, Sebastian Stier, and JungHwan Yang. 2022. "Coordination patterns reveal online political astroturfing across the world." Scientific Reports 2022 (12): 4572. doi: https://doi.org/10.1038/s41598-022-08404-9.
  • Dimitrov, Dimitar, Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and Stefan Dietze. 2020. "TweetsCOV19: A knowledge base of semantically annotated tweets about the COVID-19 pandemic." In CIKM '20: Proceedings of the 29th ACM international conference on information & knowledge management, edited by Mathieu d'Aquin, and Stefan Dietze, 2991–2998. New York: ACM. doi: https://doi.org/10.1145/3340531.3412765. https://arxiv.org/pdf/2006.14492v4.pdf.
  • Lietz, Haiko. 2020. "Drawing impossible boundaries: Field delineation of Social Network Science." Scientometrics 125 2841–2876. doi: https://doi.org/10.1007/s11192-020-03527-0.
Name Department Team Email Telephone