Text and Data Mining

Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.

Our research on Text and Data Mining

  • Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
  • Combine digital behavioral data and survey data to create new types of user models
  • Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
  • Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
  • Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
  • Extraction of key information from texts and (semi-)automatic indexing
  • Bensmann, Felix, and Benjamin Zapilko. 2023. ScienceLinker - Python Package. https://pypi.org/project/sciencelinker/.
  • Lietz, Haiko. 2024. "Practical computational analytical sociology." 16th Annual Conference of the International Network of Analytical Sociology, Leipzig University, Leipzig, 2024-05-30.
  • Dahou, Abdelhalim Hafedh, and Brigitte Mathiak. 2024 (Forthcoming). "Automatic Categorization of Software Repository Domains with Minimal Resources." In Communications in Computer and Information Science (CCIS), Book series.
  • Abdedaiem, Amin, Abdelhalim Hafedh Dahou, Mohamed Amine Cheragui, and Brigitte Mathiak. 2024. "FASSILA: A Corpus for Algerian Dialect Fake News Detection and Sentiment Analysis." In ACLing 2024: 6th International Conference on AI in Computational Linguistics, edited by Khaled Shaalan, and Samhaa El-Beltagy, Procedia Computer Science 244, 397-407. Elsevier. doi: https://doi.org/10.1016/j.procs.2024.10.214.
  • Dahou, Abdelhalim Hafedh, Mohamed Amine Cheragui, Amin Abdedaiem, and Brigitte Mathiak. 2024. "Enhancing Model Performance through Translation-based Data Augmentation in the context of Fake News Detection." In ACLing 2024: 6th International Conference on AI in Computational Linguistics, edited by Khaled Shaalan, and Samhaa El-Beltagy, Procedia Computer Science 244, 342-352. Elsevier. doi: https://doi.org/10.1016/j.procs.2024.10.208.
Title Start End Funder
Kompetenzzentrum Datenqualität in den Sozialwissenschaften (KODAQS)
2023-11-15 2026-11-14 Bund
NFDI for Data Science and Artificial Intelligence (NFDI4DS)
2021-10-01 2026-09-30 DFG
NFDI for Business, Economic and Related Data (BERD@NFDI)
2021-10-01 2026-09-30 DFG
Dehumanization Online: Measurement and Consequences (Professorinnenprogramm) (DeHum)
2021-01-01 2027-03-31 SAW (Leibniz)

Find out more about our consulting and services: