GESIS Leibniz Institute for the Social Sciences: Go to homepage

Text and Data Mining

Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.

Main research areas are:

  • Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
  • Combine digital behavioral data and survey data to create new types of user models
  • Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
  • Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
  • Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
  • Extraction of key information from texts and (semi-)automatic indexing
  • Sen, Indira, Mattia Samory, Claudia Wagner, and Isabelle Augenstein. 2022. "Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection." In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, edited by Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz, 4716–4726. Seattle: Association for Computational Linguistics. doi: https://doi.org/10.18653/v1/2022.naacl-main.347.
  • Soldner, Felix, Bennett Kleinberg, and Shane Johnson. 2022. Confounds and Overestimations in Fake Review Detection: Experimentally Controlling for Product-Ownership and Data-Origin. https://osf.io/29euc/?view_only=d382b6f03e1444ffa83da3ea04f1a04a.
  • Batzdorfer, Veronika. 2022. "Conspiracy theories on Twitter: Emerging motifs and temporal dynamics during the COVID-19 pandemic." ODISSEI Conference for Social Science in the Netherlands 2022, Open Data Infrastructure for Social Science and Economic Innovations, Utrecht, 03.11.2022.
  • Martins Rosa, Jorge, N. Gizem Bacaksizlar Turbic, Alda Magalhães Telles, Clara González Tosat, Cristian Jiménez Ruiz, Kalliopi Moraiti, Özgür Karadeniz, and Valentina Pallacci. 2022. "Exploring User Engagement with Portuguese Political Party Pages on Facebook: Data Sprint as Workflow." Dígitos. Revista de Comunicación Digital 8 127-154. doi: https://doi.org/10.7203/drdcd.v1i8.233.
  • Soldner, Felix, Fabian Plum, Bennett Kleinberg, and Shane Johnson. 2022. "From the dark to the surface web: Scouting eBay for counterfeits." ODISSEI Conference for Social Science in the Netherlands 2022, Open Data Infrastructure for Social Science and Economic Innovations, Utrecht, 03.11.2022.