GESIS online talks on Social Science Methods and Research Data
With our series we offer brief insights into current social science methodological research and the design and analysis potential of GESIS research data.
Set up & registration: Each session consists of a talk and a moderated Q&A part. All talks will take place online as Zoom meetings on Thursdays, 1 pm-2 pm (CET/CEST). Please register for the session(s) you are interested in below (click on blue bars beyond to expand). Your registration will be confirmed by email.
Data protection: Your contact information will be deleted at GESIS after the talks you registered for have been completed. More information on data protection at GESIS can be found here.
Slides and a recording of the talk will be made publicly available after each session. Please check the descriptions of past talks for respective links (click on blue bars to expand). For the recordings you might also go directly to the “meet the experts” playlists on the GESIS YouTube channel. (Side note: only the talk will be recorded, not the Q&A.)
Contact, questions & feedback: You can reach the meet the experts team via email.
If you wish to keep up with events and other GESIS activities, please subscribe to the monthly bilingual GESIS newsletter.
Staffel 9: Unlocking Data Quality with KODAQS: Smart Tools for Better Social Science Data
The quality of data determines whether research findings are robust, comparable, and truly meaningful. In the social sciences – with their wide range of sources such as survey data, text data, and linked datasets – researchers face both opportunities and recurring challenges: biased samples, varying response quality, unclear provenance, or gaps in documentation. The KODAQS Meet the Experts series is dedicated to addressing these challenges. In this compact set of talks, we present, step by step, how freely available tools from the KODAQS Toolbox can be used to systematically analyze and improve data quality – from assessing representativeness and measurement quality to the transparent documentation of digital data sources. Each session combines concise input with practical demonstrations and is designed for researchers and practitioners aiming to plan, conduct, and document robust analyses in the social sciences.
16.10.2025 (THU), 13:00-14:00 (CET): KODAQS Toolbox: Ensuring Measurement Validity for computational text analysis with ValiText
Slides (2.98 MB)| Presentation on YouTube | MTE Playlist
The Lecture will be held in English.
The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), provides researchers with practical tools and tutorials for evaluating and enhancing the quality of survey, digital behavioral, and linked data. Validation is a necessary requirement for any computational text analysis. However, a common challenge in this field is the lack of conceptual clarity on how to establish validity. This talk introduces ValiText, a validation framework from the KODAQS Toolbox for computational text-based measures of social constructs. ValiText supports researchers by providing a shared vocabulary for different validation steps, as well as practical checklists that can be downloaded and completed to document the validation process. The applicability of the tool is illustrated through a case study on measuring politicians' personalities in political self-presentation.
Presenter:
06.11.2025 (THU), 13:00-14:00 (CET): KODAQS Toolbox: Using SampCompR for survey representation bias comparisons
Slides (3.18 MB)| Presentation on YouTube | MTE Playlist
The Lecture will be held in English.
The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), provides researchers with practical tools and tutorials for evaluating and enhancing the quality of survey, digital behavioral, and linked data. Evaluating representation bias remains a core challenge in survey methodology, particularly when analyses extend beyond simple univariate comparisons to bivariate or multivariate structures. Using the open-source sampcompR package, the tool provides computationally supported workflow for diagnosing representation bias. The tool facilitates efficient and replicable analyses of representation bias, even in complex survey designs involving stratification, clustering, and weighting adjustments.
In this presentation, we demonstrate the application of sampcompR using an illustrative example: What biases arise in estimates when recruitment is geographically restricted within a general population survey in the United States? The results show that such a design decision can lead to substantial distortions in univariate distributions and, to a lesser extent, in bivariate associations—while multivariate models are more robust in this specific scenario.
Beyond this case study, we discuss further applications, including the evaluation of nonprobability sample estimates and the quantification of representation bias introduced by survey mode effects. The approach presented here offers survey researchers a transparent and practical framework to assess and communicate representation-related limitations of their data and findings.
Presenter:
20.11.2025 (THU), 13:00-14:30 (CET): Meet the Data: Introducing the Scientific Use File for PIAAC 2023 Germany
Slides (1.42 MB)| Presentation on YouTube | MTE Playlist
The Lecture will be held in English.
The second cycle of PIAAC, the Programme for the International Assessment of Adult Competencies, measured literacy, numeracy, and adaptive problem-solving skills of the adult population (aged 16 to 65) in 31 countries. In addition to the cognitive data, PIAAC also collected data from a comprehensive background questionnaire.
The German team at GESIS recently published a Scientific Use File (SUF) with the German PIAAC 2023 data. This "Meet the Data" session offers an introduction to the German SUF data for all interested data users. The objective is to provide you with general information on PIAAC, give an overview of the background questionnaire and its plethora of analytically interesting variables, and introduce the dataset, key tools, and documentation. We will go into some specifics of the German data, for example the measurement of education, point out some particularities of the PIAAC data (e.g., structure, missing scheme, weighting concept, plausible values), and talk about similarities and differences between Cycles 1 and 2 of PIAAC and how this impacts trend comparisons. This is the opportunity for you to interact with the data producers and address your questions directly.
Presenters:
11.12.2025 (THU), 13:00-14:00 (CET): KODAQS Toolbox: Resquin – Assessing response quality in multi-item scales
Slides (7.77 MB)| Presentation on YouTube | MTE Playlist
The Lecture will be held in English.
The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), equips researchers with practical tools and tutorials for assessing and improving data quality across survey, digital behavioral, and linked data. In this talk, we highlight Resquin, a tool designed to assess measurement errors and response biases in multi-item survey scales. Attendees to this talk will learn to:
- Detect response quality issues through distribution indicators.
- Detect and interpret problematic response styles.
- Understand strengths and limitations of different response quality indicators.
- Address poor-quality survey responses.
Resquin, based on the Resquin R-package (Roth et al., 2024), provides replicable R-code and guidance for enhancing the validity of survey data.
Presenter:
18.12.2025 (THU), 15:00-16:30 (CET): MDA - Lessons learned - Meet the Editors: Exploring the Methodological Choices, Challenges and Solutions in working with New Data Sources
Slides | Presentation on YouTube | MTE Playlist
The Lecture will be held in English.
This special session of the GESIS Meet the Experts format “Meet the Editor” is based on a Special Issue of the Journal mda that explores and promotes scholarly reflection and transparency on emerging best practices in the collection, processing, analysis and sharing of new data sources within the research community. It is a topic of growing importance for quantitative social sciences given that social media and digital trace data as well as data collected from sensors are now widely used in many areas of research. In large part, the appeal of these data is that they offer fresh insight into new and ongoing research questions and allow researchers access to ‘harder to reach’ geographies and populations. Notwithstanding their appeal, these data can often be difficult to obtain due to platform restrictions, are subject to potentially multiple sources of unknown bias and require extensive curation and cleaning prior to analysis. In this special issue, we bring together a range of papers that highlight the challenges and opportunities in working with these new and emerging digital trace data sources, and in particular the value they add when integrated with more traditional forms of data.
As well as showcasing papers that directly explore the ‘added’ value of these new data sources for investigating ongoing social science debates, the Special Issue incorporates an exciting new initiative in which authors share with readers their reflections on the process of producing their published work. Through these so-called ‘reflective appendices’ our authors explicitly confront and interrogate the assumptions and processes that guided their analysis and explain where and how these changed during the course of their research. Questions addressed include whether decisions regarding sampling frames and sizes were adjusted, if alternative forms of data were considered and adopted or discarded, whether the methodological and analytical approaches designated at the start were modified in any way to adapt to issues in working with new data sources and what if any ‘lessons learned’ they would pass on to others in the field to improve the robustness of future studies seeking to exploit these new forms of data. Longer term, we hope the Special Issue and particularly our authors’ reflective appendices serve to promote the ongoing shift toward embedding a more ‘open research’ culture within the social sciences.
The session will involve guest editors Trent D. Buskirk and Rachel Gibson and Special Issue authors in a discussion about how their work contributes to a better understanding the challenges and solutions in working with new data sources, and what they perceive as the value of the reflective appendices both in relation to their own work but for the discipline more generally.
Presenters:
Trent D. Buskirk,
Rachel Gibson (Guest Editors)
in conversation with authors
15.01.2026 (THU), 13:00-14:00 (CET): KODAQS Toolbox: AreaMatch - Linking survey and geospatial data with misaligned spatial units
Slides | Presentation on YouTube | MTE Playlist
The Lecture will be held in English.
The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), provides researchers with practical tools and tutorials for evaluating and enhancing the quality of survey, digital behavioral, and linked data. In this talk, we present AreaMatch, an open-source, R-based tool that links survey data with location information to contextual indicators tied to different spatial units, e.g. linking population density of municipalities to respondents’ self-reported postal codes. The tool implements three matching techniques: centroid linkage, areal matching, and area-weighted interpolation. This offers users the flexibility to compare the accuracy, consistency, and implications for analysis of each technique. A built-in evaluation routine quantifies differences in the linking results and their influence on substantive research questions such as correlations between population density and political attitudes. This presentation will walk through the tool’s workflow, highlight comparative results, and conclude with practical guidance on choosing and validating spatial linkage approaches.
Presenter: Anne Stroppe
05.02.2026(THU), 13:00-14:00 (CET): KODAQS Toolbox: Preprocessing text data with TextPrep
Slides | Presentation on YouTube | MTE Playlist
This Lecture will be held in English.
The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), equips researchers with practical tools and tutorials for assessing and improving data quality across survey, digital behavioral, and linked data. In this talk, we highlight TextPrep, a tool designed to assess how preprocessing methods, such as automated translation, minor text operations, and stopword removal, can significantly improve the quality of social media data depended on use case, data types, and methods. By systematically evaluating and comparing different approaches (e.g. different stopword lists), it is highlighted how they can alter textual content and impact data interpretation and quality. Text similarity measures, such as word count or cosine similarity, are used to document differences between the various preprocessing strategies and packages. Also Structural Topic Modeling is applied to compare different preprocessing stages using semantic coherence and exclusivity. With TextPrep, all of this can be assessed and implemented in an automated process through commented R code, which can be adapted and transfered to different use cases.
Presenter: Yannik Peters
05.03.2026(THU), 13:00-14:00 (CET): KODAQS Toolbox: Documenting and critically reflecting on online platform datasets with TES-D
Slides | Presentation on YouTube | MTE Playlist
This Lecture will be held in English.
The Total Error Sheets for Datasets (TES-D) offer a structured, template-based tool for documenting datasets collected from online platforms. Particularly in Computational Social Science, digital behavioral data (DBD) from social media platforms like Facebook or X (formerly Twitter) or content platforms like YouTube and Wikipedia are frequently collected and studied. Designed to guide researchers through a critical reflection on the data collection process, TES-D helps identify and document sources of bias and error that affect the intrinsic quality of the resulting datasets. Inspired by documentation practices in Machine Learning and error frameworks from the social sciences, TES-D is built around a catalogue of questions targeted at different phases of the data collection process. By promoting transparency and standardization, TES-D supports FAIR data principles and encourages higher quality and more reusable digital behavioral data across disciplines.
Presenter: Leon Fröhling
Archive
Missed an episode? No problem! In our archive you can listen to all episodes of our expert series. In the event information you will find the link to the recordings on Youtube. Additionally, you have the possibility to download the PowerPoint slides.
- Season 1 - Survey Methodology
- Season 2 - Computational Social Science and Digital Behavioral Data
- Season 3 - Data and Research on Society
- Season 4: Augmenting survey data by linking and harmonisation
- Season 5: Data Services, Data Archiving, and Research Data Management
- Season 6: Knowledge technologies for the Social Science: Access to Social Science Data and Services
- Season 7: New data sets and data potentials in the Social Sciences
- Season 8: Questionnaire design – GESIS offerings to researchers in terms of services and tools