Text and data mining comprises the development and application of methods which are designed to extract knowledge that is relevant to the social sciences from unstructured texts or data streams.
Main research areas are:
- Detection of statistical regularities in data and text and alignment of these regularities with variables of interest such as political leaning or gender
- Combine digital behavioral data and survey data to create new types of user models
- Semantic enrichment and analysis of collaboratively generated documents (e.g. wikipedia articles or scientific publications) and the social dynamics of the creation process (e.g. conflicts, productivity)
- Statistical modelling of sequential human behavior (e.g., the decisions made when navigating on the web or individual movement in urban surroundings)
- Detection, disambiguation and linking of entities which are of interest for the social sciences in academic publications (especially references to research data)
- Extraction of key information from texts and (semi-)automatic indexing
- Soldner, Felix, Justin Chun-ting Ho, Mykola Makhortykh, Isabelle W.J. van der Vegt, Maximilian Mozes, and Bennett Kleinberg. 2019. "Sentiment patterns in videos from left- and right-wing YouTube news channels." Euro CSS 2019, 2019-09-02.
- Soldner, Felix, Justin Chun-ting Ho, Mykola Makhortykh, Isabelle W.J. van der Vegt, Maximilian Mozes, and Bennett Kleinberg. 2019. "Sentiment patterns in videos from left- and right-wing YouTube news channels." NAACL 2019, Workshop NLP + CSS.
- Kohne, Julian, Jon Elhai, and Christian Montag. 2023. "A Practical Guide to WhatsApp Data in Social Science Research." In Digital Phenotyping and Mobile Sensing, edited by Harald Baumeister, and Christian Montag, 171 - 205. Cham: Springer. doi: https://doi.org/10.1007/978-3-030-98546-2_11.
- Soldner, Felix, Bennett Kleinberg, and Shane Johnson. 2021. "Data confounds lead to performance overestimations in fake review detections." IC2S2 2021 - 7th International Conference on Computational Social Science, ETH Zürich, 2021-07-27.
- Dimitrov, Dimitar, Dennis Segeth, and Stefan Dietze. 2022. "Geotagging TweetsCOV19: Enriching a COVID-19 Twitter Discourse Knowledge Base with Geographic Information." In Companion Proceedings of WWW '22: The ACM Web Conference 2022 Virtual Event, Lyon France April 25 - 29, 2022, edited by Frédérique Laforest, Raphaël Troncy, Lionel Médini, and Ivan Herman, 438-442. New York: ACM. doi: https://doi.org/10.1145/3487553.3524623.