Meet the Experts

GESIS Online-Vorträge über sozialwissenschaftliche Methoden und Forschungsdaten

Mit unserer Reihe bieten wir allen Interessierten kurze Einblicke in aktuelle sozialwissenschaftliche Methodenforschung sowie in Design und Analysepotenzial von GESIS-Forschungsdaten.

Rahmenbedingungen & Anmeldung: Jede Sitzung besteht aus einem Vortrag und einem moderierten Q&A-Teil. Alle Vorträge finden online als Zoom-Sitzungen donnerstags von 13:00-14:00 Uhr (CET/CEST) statt. Bitte melden Sie sich unter den einzelnen Präsentationen für die Sitzung(en) an, für die Sie sich interessieren. Ihre Anmeldung wird per E-Mail bestätigt.

Datenschutz: Ihre Kontaktdaten werden nach Beendigung der Vorträge, für die Sie sich angemeldet hatten, bei GESIS gelöscht. Weitere Informationen zum Datenschutz bei GESIS finden Sie hier.

Die Folien und eine Aufzeichnung des Vortrags werden nach jeder Sitzung öffentlich zugänglich gemacht. Die entsprechenden Links finden Sie in den Beschreibungen der vergangenen Vorträge (zum Vergrößern auf die blauen Balken klicken). Für die Aufzeichnungen können Sie auch direkt zu den "Meet the Experts"-Playlists auf dem GESIS YouTube-Kanal gehen. Es wird lediglich der Vortrag aufgezeichnet, nicht die Fragen und Antworten.

Kontakt, Fragen und Feedback: Sie können das Team von "Meet the Experts" per E-Mail erreichen.

Wenn Sie über Veranstaltungen und andere Aktivitäten von GESIS auf dem Laufenden bleiben möchten, abonnieren Sie einfach den monatlich erscheinenden zweisprachigen GESIS-Newsletter.

Staffel 9: Mit KODAQS Datenqualität sichern: Clevere Tools für die Sozialwissenschaften

Die Qualität von Daten entscheidet darüber, ob Forschungsergebnisse belastbar, vergleichbar und wirklich nützlich sind. Gerade in den Sozialwissenschaften – mit ihrer Vielfalt an Quellen wie Umfragedaten, Textdaten und verknüpften Datensätzen – entstehen neben großen Chancen auch typische Risiken: verzerrte Stichproben, schwankende Antwortqualität, unklare Provenienz oder Lücken in der Dokumentation. Die KODAQS Meet-the-Experts-Staffel widmet sich genau diesen Herausforderungen. In dieser kompakten Reihe zeigen wir Schritt für Schritt, wie frei verfügbare Werkzeuge aus der KODAQS Toolbox dabei helfen, Datenqualität systematisch zu analysieren und gezielt zu verbessern – von der Prüfung der Repräsentativität über die Bewertung von Messgüte bis zur transparenten Beschreibung digitaler Datenquellen. Die Vorträge verbinden kurze Inputs mit praxisnahen Demonstrationen und richten sich an Forschende und Praktiker*innen, die robuste Analysen planen, durchführen und nachvollziehbar dokumentieren möchten.

Registration (via Zoom)   |  

Slides (2.98 MB)|   Presentation on YouTube   |   MTE Playlist

The Lecture will be held in English.

The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), provides researchers with practical tools and tutorials for evaluating and enhancing the quality of survey, digital behavioral, and linked data. Validation is a necessary requirement for any computational text analysis. However, a common challenge in this field is the lack of conceptual clarity on how to establish validity. This talk introduces ValiText, a validation framework from the KODAQS Toolbox for computational text-based measures of social constructs. ValiText supports researchers by providing a shared vocabulary for different validation steps, as well as practical checklists that can be downloaded and completed to document the validation process. The applicability of the tool is illustrated through a case study on measuring politicians' personalities in political self-presentation.

Presenter:

Lukas Birkenmaier

Registration (via Zoom)  |

Slides (3.18 MB)|  Presentation on YouTube  |   MTE Playlist

The Lecture will be held in English.

The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), provides researchers with practical tools and tutorials for evaluating and enhancing the quality of survey, digital behavioral, and linked data. Evaluating representation bias remains a core challenge in survey methodology, particularly when analyses extend beyond simple univariate comparisons to bivariate or multivariate structures. Using the open-source sampcompR package, the tool provides computationally supported workflow for diagnosing representation bias. The tool facilitates efficient and replicable analyses of representation bias, even in complex survey designs involving stratification, clustering, and weighting adjustments.

In this presentation, we demonstrate the application of sampcompR using an illustrative example: What biases arise in estimates when recruitment is geographically restricted within a general population survey in the United States? The results show that such a design decision can lead to substantial distortions in univariate distributions and, to a lesser extent, in bivariate associations—while multivariate models are more robust in this specific scenario.

Beyond this case study, we discuss further applications, including the evaluation of nonprobability sample estimates and the quantification of representation bias introduced by survey mode effects. The approach presented here offers survey researchers a transparent and practical framework to assess and communicate representation-related limitations of their data and findings.

Presenter:

Björn Rohr

Registration (via Zoom)   |  

Slides (1.42 MB)|   Presentation on YouTube   |   MTE Playlist

The Lecture will be held in English.

The second cycle of PIAAC, the Programme for the International Assessment of Adult Competencies, measured literacy, numeracy, and adaptive problem-solving skills of the adult population (aged 16 to 65) in 31 countries. In addition to the cognitive data, PIAAC also collected data from a comprehensive background questionnaire.

The German team at GESIS recently published a Scientific Use File (SUF) with the German PIAAC 2023 data. This "Meet the Data" session offers an introduction to the German SUF data for all interested data users. The objective is to provide you with general information on PIAAC, give an overview of the background questionnaire and its plethora of analytically interesting variables, and introduce the dataset, key tools, and documentation. We will go into some specifics of the German data, for example the measurement of education, point out some particularities of the PIAAC data (e.g., structure, missing scheme, weighting concept, plausible values), and talk about similarities and differences between Cycles 1 and 2 of PIAAC and how this impacts trend comparisons. This is the opportunity for you to interact with the data producers and address your questions directly.

Presenters:

Anouk Zabal

Natascha Massing

Silke Martin

Registration (via Zoom)   |

Slides (7.77 MB)|   Presentation on YouTube   |   MTE Playlist

The Lecture will be held in English.

The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), equips researchers with practical tools and tutorials for assessing and improving data quality across survey, digital behavioral, and linked data. In this talk, we highlight Resquin, a tool designed to assess measurement errors and response biases in multi-item survey scales. Attendees to this talk will learn to:

  • Detect response quality issues through distribution indicators.
  • Detect and interpret problematic response styles.
  • Understand strengths and limitations of different response quality indicators.
  • Address poor-quality survey responses.

Resquin, based on the Resquin R-package (Roth et al., 2024), provides replicable R-code and guidance for enhancing the validity of survey data.

Presenter:

Fabienne Kraemer

Registration (via Zoom)   |

Slides |   Presentation on YouTube   |   MTE Playlist

The Lecture will be held in English.

This special session of the GESIS Meet the Experts format “Meet the Editor” is based on a Special Issue of the Journal mda that explores and promotes scholarly reflection and transparency on emerging best practices in the collection, processing, analysis and sharing of new data sources within the research community. It is a topic of growing importance for quantitative social sciences given that social media and digital trace data as well as data collected from sensors are now widely used in many areas of research. In large part, the appeal of these data is that they offer fresh insight into new and ongoing research questions and allow researchers access to ‘harder to reach’ geographies and populations. Notwithstanding their appeal, these data can often be difficult to obtain due to platform restrictions, are subject to potentially multiple sources of unknown bias and require extensive curation and cleaning prior to analysis. In this special issue, we bring together a range of papers that highlight the challenges and opportunities in working with these new and emerging digital trace data sources, and in particular the value they add when integrated with more traditional forms of data. 

As well as showcasing papers that directly explore the ‘added’ value of these new data sources for investigating ongoing social science debates, the Special Issue incorporates an exciting new initiative in which authors share with readers their reflections on the process of producing their published work. Through these so-called ‘reflective appendices’ our authors explicitly confront and interrogate the assumptions and processes that guided their analysis and explain where and how these changed during the course of their research. Questions addressed include whether decisions regarding sampling frames and sizes were adjusted, if alternative forms of data were considered and adopted or discarded, whether the methodological and analytical approaches designated at the start were modified in any way to adapt to issues in working with new data sources and what if any ‘lessons learned’ they would pass on to others in the field to improve the robustness of future studies seeking to exploit these new forms of data. Longer term, we hope the Special Issue and particularly our authors’ reflective appendices serve to promote the ongoing shift toward embedding a more ‘open research’ culture within the social sciences. 

The session will involve guest editors Trent D. Buskirk and Rachel Gibson and Special Issue authors in a discussion about how their work contributes to a better understanding the challenges and solutions in working with new data sources, and what they perceive as the value of the reflective appendices both in relation to their own work but for the discipline more generally.


Presenters:

Trent D. Buskirk,

Rachel Gibson (Guest Editors) 

in conversation with authors

Registration (via Zoom)

Slides |  Presentation on YouTube  |  MTE Playlist

The Lecture will be held in English.

The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), provides researchers with practical tools and tutorials for evaluating and enhancing the quality of survey, digital behavioral, and linked data. In this talk, we present AreaMatch, an open-source, R-based tool that links survey data with location information to contextual indicators tied to different spatial units, e.g. linking population density of municipalities to respondents’ self-reported postal codes. The tool implements three matching techniques: centroid linkage, areal matching, and area-weighted interpolation. This offers users the flexibility to compare the accuracy, consistency, and implications for analysis of each technique. A built-in evaluation routine quantifies differences in the linking results and their influence on substantive research questions such as correlations between population density and political attitudes. This presentation will walk through the tool’s workflow, highlight comparative results, and conclude with practical guidance on choosing and validating spatial linkage approaches.

Presenter: 

Anne Stroppe

Registration (via Zoom)

Slides |  Presentation on YouTube  |  MTE Playlist

This Lecture will be held in English.

The KODAQS Data Quality Toolbox, developed by the Competence Center for Data Quality (KODAQS), equips researchers with practical tools and tutorials for assessing and improving data quality across survey, digital behavioral, and linked data. In this talk, we highlight TextPrep, a tool designed to assess how preprocessing methods, such as automated translation, minor text operations, and stopword removal, can significantly improve the quality of social media data depended on use case, data types, and methods. By systematically evaluating and comparing different approaches (e.g. different stopword lists), it is highlighted how they can alter textual content and impact data interpretation and quality. Text similarity measures, such as word count or cosine similarity, are used to document differences between the various preprocessing strategies and packages. Also Structural Topic Modeling is applied to compare different preprocessing stages using semantic coherence and exclusivity. With TextPrep, all of this can be assessed and implemented in an automated process through commented R code, which can be adapted and transfered to different use cases.

Presenter: Yannik Peters

Registration (via Zoom)

Slides |  Presentation on YouTube  |  MTE Playlist

This Lecture will be held in English.

The Total Error Sheets for Datasets (TES-D) offer a structured, template-based tool for documenting datasets collected from online platforms. Particularly in Computational Social Science, digital behavioral data (DBD) from social media platforms like Facebook or X (formerly Twitter) or content platforms like YouTube and Wikipedia are frequently collected and studied. Designed to guide researchers through a critical reflection on the data collection process, TES-D helps identify and document sources of bias and error that affect the intrinsic quality of the resulting datasets. Inspired by documentation practices in Machine Learning and error frameworks from the social sciences, TES-D is built around a catalogue of questions targeted at different phases of the data collection process. By promoting transparency and standardization, TES-D supports FAIR data principles and encourages higher quality and more reusable digital behavioral data across disciplines.

Presenter: Leon Fröhling