Data Science & Natural Language Processing (NLP)

Our research objective in the field of data science and natural language processing (NLP) is the development of innovative methods and tools for the collection, utilization, processing, and analysis of research data - e.g., data from social media or smart devices.

The main areas of application are the linking of digital behavioral data with survey data and the development and validation of computational social science methods for measuring socially relevant constructs based on digital behavioral data.

Another focus is the development of NLP methods for the automated indexing and processing of unstructured scientific information resources, such as publications or data sets, to improve their findability, usability, and reproducibility.

Our research contributes to improving the quality of research data and methods of computer-based social sciences (such as machine learning models).

Research Output

  • Clemm von Hohenberg, Bernhard, Sebastian Stier, Ana S. Cardenal, Andrew M. Guess, Ericka Menchen-Trevino, and Magdalena Wojcieszak. 2024. "Analysis of Web Browsing Data: A Guide." Social Science Computer Review 42 (6): 1479-504. doi: 10.1177/08944393241227868.
  • Feger, Marc, and Stefan Dietze. 2024. "BERTweet’s TACO Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On Twitter." In Findings of the Association for Computational Linguistics: NAACL 2024, ed. Kevin Duh, Helena Gomez, and Seven Bethard, 2256-66. Mexico City, Mexico: Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-naacl.146.
  • Kohne, Julian, and Christian Montag. 2024. "ChatDashboard: A Framework to collect, link, and process donated WhatsApp Chat Log Data." Behavior Research Methods 56 (56): 3658-84. doi: 10.3758/s13428-023-02276-1.
  • Maurer, Maximilian, Tanise Ceron, Sebastian Padó, and Gabriella Lapesa. 2024. "Toeing the Party Line: Election Manifestos as a Key to Understand Political Discourse on Twitter." In Findings of the Association for Computational Linguistics: EMNLP 2024, ed. Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen, 6115-30. Miami: Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-emnlp.354.
  • Ulloa, Roberto, Frank Mangold, Felix Schmidt, Judith Gilsbach, and Sebastian Stier. 2025 (Forthcoming). "Beyond time delays: How web scraping distorts measures of online news consumption." Communication Methods and Measures: 1-22. doi: 10.1080/19312458.2025.2482538.