GESIS Leibniz Institute for the Social Sciences: Go to homepage

Text and Data Mining

Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.

Main research areas are:

  • Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
  • Combine digital behavioral data and survey data to create new types of user models
  • Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
  • Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
  • Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
  • Extraction of key information from texts and (semi-)automatic indexing
Name Department Team Email Telephone
Assenmacher, Dennis
Computational Social Science
Data Science Methods
+49 (0221) 47694-484
Backes, Tobias
Knowledge Technologies for the Social Sciences
Information and Data Retrieval
+49 (0221) 47694-539
Batzdorfer, Dr. Veronika
Computational Social Science
Digital Society Observatory
+49 (0221) 47694-452
Bensmann, Felix
Knowledge Technologies for the Social Sciences
Information Extraction and Linking
+49 (0221) 47694-524
Bleier, Dr. Arnim
Computational Social Science
Transparent Social Analytics
+49 (0221) 47694-514
Breuer, Dr. Johannes
Survey Data Curation
Survey Data Augmentation
+49 (0221) 47694-471
Dahou, Abdelhalim Hafedh
Knowledge Technologies for the Social Sciences
FAIR Data
+49 (0221) 47694-430
Froehling, Leon
Computational Social Science
Digital Society Observatory
+49 (0221) 47694-585
Kohne, Julian
Computational Social Science
Designed Digital Data
+49 (0221) 47694-222
Linzbach, Stephan
Knowledge Technologies for the Social Sciences
Big Data Analytics
+49 (0221) 47694-715
Mathiak, Dr. Brigitte
Knowledge Technologies for the Social Sciences
FAIR Data
+49 (0221) 47694-510
Otto, Wolfgang
Knowledge Technologies for the Social Sciences
Information Extraction and Linking
+49 (0221) 47694-543
Soldner, Felix
Computational Social Science
Digital Society Observatory
+49 (0221) 47694-234
Stier, Prof. Dr. Sebastian
Computational Social Science
+49 (0221) 47694-221
Strohmaier, Prof. Dr. Markus
Präsidialbereich
+49 (0221) 47694-225
Wagner, Prof. Dr. Claudia
Computational Social Science
+49 (0221) 47694-224
Zagovora, Olga
Computational Social Science
Digital Society Observatory
+49 (0221) 47694-216
Ziaja, Dr. Sebastian
Survey Data Curation
Survey Data Augmentation
+49 (0221) 47694-462
Zloch, Dr. (rer. nat.) Matthäus
+49 (0221) 47694-534
  • Dahou, Abdelhalim Hafedh, Mohamed Amine Cheragui, and Ahmed Abdelali. 2023. "Performance Analysis of Arabic Pre-Trained Models on Named Entity Recognition Task: https://aclanthology.org/2023.ranlp-1.51.pdf." In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, edited by Ruslan Mitkov, and Galia Angelova, 458–467. Shoumen: INCOMA Ltd..
  • Diera, Andor, Abdelhalim Hafedh Dahou, Lukas Galke, Fabian Karl, Florian Sihler, and Ansgar Scherp. 2023. "GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding." First GenBench workshop on generalisation (benchmarking) in NLP, EMNLP 2023. doi: https://doi.org/10.48550/arXiv.2311.09707.
  • Dahou, Abdelhalim Hafedh, and Brigitte Mathiak. 2023. "Subject Classification of Software Repository." In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR, 1, 30-38. SciTePress. doi: https://doi.org/10.5220/0012159600003598.
  • Lietz, Haiko, Mohsen Jadidi, Daniel Kostic, Milena Tsvetkova, and Claudia Wagner. 2023 (Forthcoming). "Individual and gender inequality in computer science: A career study of cohorts from 1970 to 2000." Quantitative Science Studies.
  • Kohne, Julian. 2023. "ChatDashboard - A Framework to collect, link, and process donated WhatsApp Chat Log Data." European Survey Research Association Conference (ESRA), University of Milano - Bicocca, Mailand, 2023-07-18.