GESIS Training

2023, 37 onsite and online workshops took place. The workshops were held partly in English and partly in German and covered a large part of empirical social research.

YouTube is the largest and most popular video platform on the internet. The producers and users of YouTube content generate huge amounts of data. These data are also of interest to researchers (in the social sciences as well as other disciplines) for studying different aspects of online media use and communication. Accessing and working with these data, however, can be challenging. In this workshop, we will first discuss the potential of YouTube data for research in the social sciences, and then introduce participants to the YouTube API as well as different tools for collecting YouTube data. Our focus for the main part of the workshop will be on using R for collecting, processing, and analyzing data from YouTube (using various packages). Regarding the type of data, we will focus on user comments but will also (briefly) look into other YouTube data, such as video statistics and subtitles. For the comments, we will show how to clean/process them in R, how to deal with emojis, and how to do some basic forms of automated text analysis (e.g., word frequencies, sentiment analysis). While we believe that YouTube data has great potential for research in the social sciences (and other disciplines), we will also discuss the unique challenges and limitations of using this data.

Der Workshop fokussiert die praktische Durchführung von qualitativen Interviews. Hierfür werden zunächst Grundlagen zu Interviewformen, Interviewkonzeption und Sampling vermittelt. Schließlich stehen Übungen zur Erstellung eines Leitfadens und zu verschiedenen Interviewtechniken im Zentrum des Workshops. Hierbei helfen Übungen, die Interviewsituation als komplexe Interaktion verständlich zu machen. Mit der Thematisierung von online und telefonischen Interviews gehen wir außerdem auf die Forschungssituation in Zeiten einer Pandemie ein. Zum Abschluss des Workshops geben wir Ausblicke auf die qualitative Inhaltsanalyse und die Objektive Hermeneutik als Auswertungsmethoden und diskutieren ethische Aspekte qualitativer Interviewstudien.

R is a powerful, versatile, and open-source software environment for statistical computing. With R, it is possible to manage and transform data, perform a plethora of statistical analyses, and visualize scientific results. However, using R for the first time can be daunting. R is a programming language and thus works differently than many commercial statistical software packages that primarily use graphical user interfaces (e.g., SPSS, Excel).  In this three-day workshop, we will introduce R to researchers with no or very little prior experience with R.

Any data analysis is based on a large number of decisions. These decisions relate to, among other things, study design, data preparation, and the selection and specification of statistical models (Rijnhart et al. 2021). Therefore, a single analysis represents only one possibility among a larger set of alternatives. This leads to the question of how much the analysis results depend on the often undocumented choices.

The relatively new approach of multiverse analysis (Steegen et al. 2016) addresses two fundamental problems in research: the lack of transparency and the dependence of analysis results on data-analytic decisions (Young 2018). The idea of multiverse analysis is to conduct not just one, but ideally all (meaningful) analyses and present the results in summary form. This can make the impact of data analytic decisions on the results transparent and assess whether the conclusions are robust to alternative (modeling) decisions.

However, multiverse analysis can be time-consuming and resource-intensive, and it introduces new questions and challenges. This refers especially to the comparison of the (many) results. On the other hand, it also makes aspects of the scientific process easier, as it relieves the researcher from the burden to "sell" the best possible story.

In the course, the basics of multiverse analysis are explained step by step, applied to a real data example in Stata, and the results are presented using a so-called specification curve (Simonsohn et al. 2020). Participants will gain practical experience in conducting a multiverse analysis. The focus is on the application of a multiverse analysis for collected data. 

Im Bereich der empirischen Sozialforschung lässt sich seit längerer Zeit ein Trend zu selbstadministrierten und insbesondere Online-Umfragen beobachten. Mittlerweile gibt es eine Vielzahl niedrigschwelliger Angebote und Tools, eigene Forschungsideen als Websurveys umzusetzen und Stichproben von Befragten zu rekrutieren. Gleichzeitig zeigt die Erfahrung, dass in der universitären Ausbildung Kompetenzen zur Programmierung eigener Umfragen häufig nur am Rande – wenn überhaupt – vermittelt werden.

Der Workshop möchte diese Lücke füllen und bietet eine praktische Einführung in die Erstellung von Online-Umfragen am Beispiel des Umfragetools Unipark/EFS. Er zielte darauf ab, grundlegende Fähigkeiten in der Herangehensweise an dieses Vorhaben zu erlernen und Kennnisse der wesentlichen Elemente der Fragebogenprogrammierung zu vermitteln. Themen werden die Planung der einzelnen Arbeitsschritte, Projekttypen und Fragetypen, Filter und Plausibilitätschecks, die Anpassung des Layouts, das Testen der Programmierung sowie die wichtigsten Schritte bei Feldstart und Datenexport sein. Die Besprechung vieler weiterer Funktionalitäten ist möglich und Teilnehmende werden ermutigt, Themenwünsche auch schon im Vorfeld mitzuteilen.

Der Workshop hat zum Ziel, die Grundideen und Strategien der Grounded-Theory-Methodologie (GTM) – eine der am weitesten verbreiteten qualitativen Forschungsmethodologien – zu vermitteln und hierbei auch unterschiedliche Positionen zur GTM vorzustellen und zu diskutieren.

Orientiert an den Fragen und dem Bedarf der Teilnehmenden werden die wesentlichen Konzepte und Schritte – u.a. theoretische Sensibilität; offenes, axiales und selektives Kodieren; theoretisches Sampling und theoretische Sättigung – behandelt. In den Übungen werden am Material der Teilnehmenden zentrale Arbeitsschritte gemeinsam erprobt und reflektiert.

Smartphone sensors (e.g., GPS, camera, accelerometer), apps, and wearables (e.g., smartwatches, fitness bracelets) allow researchers to collect rich behavioral data, potentially with less measurement error and lower respondent burden than self-reports through surveys. Passive mobile data collection (e.g., location tracking, call logs, browsing history) and respondents performing additional tasks on smartphones (e.g., taking pictures, scanning receipts) can augment or replace self-reports in surveys. However, there are multiple challenges to collecting these data: participant selectivity, (non)willingness to provide sensor data or perform additional tasks, privacy concerns and ethical issues, quality and usefulness of the data, and practical issues of implementation. This course will address the challenges by reviewing state-of-the-art practices of smartphone sensors, apps, and wearables data collection, ranging from small-scale studies of hard-to-reach populations to large-scale studies to produce official statistics, and discuss design best-practices for this type of measurement. Recommendations provided will include:

  • What research questions can be answered using smartphone sensors, apps, and wearables?
  • What are participants’ concerns, and how to address them?
  • How to ask for consent for sensor measurements and ensure participation?

As part of this course, participants will have the chance to work on practical issues of implementing smartphone sensors, apps, and wearables into social science research. Participants will discuss their own research study designs using new technology and have the opportunity to present the scenarios of combining survey data with data from health, accelerometery, and location sensors. The course will not discuss the analysis of sensor data, nor demonstrate how to program smartphone sensor apps.

During the workshop, participants will learn how to efficiently handle problems of data management with Stata, and how to avoid repetition by automating (and programming) tasks. The workshop is not an introduction to Stata, but will feature “best practice” of Stata usage in order to modify existing do-files (or create new ones) to be reproducible, maintainable, and efficient. The tips and tricks will refer mainly to data preparation and management, but they can also be used for automation of data analysis. The workshop will present some ideas about these topics, but focus on the interactive work where participants shall learn producing efficient Stata syntax by themselves.

The workshop will provide a comprehensive methodological and practical introduction to event history analysis. Special attention will be devoted to applications in life course research being concerned with time-dynamic modeling of social processes. After clarifying basic concepts like states, time, event, and censoring, descriptive approaches like life-tables and Kaplan-Meier estimation are discussed. Both continuous-time and discrete-time methods as well as parametric and semi-parametric regression models are introduced. Accounting for time-dependent covariates and time-varying effects are major features of survival models and will be discussed in detail. In addition, the workshop will cover a series of advanced topics like statistical inference with survival methods and survey data, multi-episode data, competing risk models, multilevel survival analysis, comparison of effects across models and groups, as well as effective visualization of model results. Substantive applications from sociological and demographic research will be used for illustration. Moreover, the software package Stata will be used throughout the workshop and hands-on exercises will help to deepen the acquired knowledge. Moreover, participants are encouraged to discuss their own work.   

Dieser Kurs widmet sich einer vielseitig einsetzbaren Interviewform, dem Expert:inneninterview. Expert:inneninterviews ermöglichen es verdichtet Informationen über professionelle Wissensbestände, wie Insiderwissen einer Organisation, Überblicks- und Kontextwissen oder Wissen über Prozessabläufe zu gewinnen. Der Workshop vermittelt grundlegende Perspektiven der qualitativen Sozialforschung, zeigt Ihnen die Möglichkeiten von Expert:inneninterviews auf und macht Sie in praktischen Übungen mit der Durchführung von Expert:inneninterviews vertraut. Dabei kommen praktische Fragen wie die Kontaktaufnahme ebenso zur Sprache wie die Erstellung eines Leitfadens und sein Einsatz im Interview. Wir schließen mit einem Ausblick auf Auswertungsstrategien. Zudem besteht die Möglichkeit, durch die Dozentinnen Feedback zu eigenen Projektideen und Herausforderungen in eigenen Projekten zu erhalten.

Computational text analysis is a fast-growing method used in a wide range of research fields: A computer scientist might ask how to extract information from unstructured text data, a communication scientist might want to detect hate speech, and a political scientist might be interested in comparing party manifestos.

The workshop introduces key concepts and methods of quantitative computational text analysis using the programming language R, which will allow researchers to analyze large quantities of text data (“big data”) in an efficient and automated way. It is aimed at those who have little or no prior experience with computational text analysis but want to use text data in their own research.

Participants will learn about common pipelines for computational preprocessing of text, such as importing and cleaning text data as well as creating corpora and extracting features (e.g., word counts or sentiments). In addition, we will analyze and visualize text data using different methodological approaches, e.g., supervised and unsupervised machine learning. To this end, the workshop will provide hands-on exercises using R to study different text data sources (e.g., text data from social media). Participants can also work with data from their own research. Each session will consist of a short introduction by the lecturer followed by hands-on exercises.

R is a versatile and powerful statistical programming language with rising importance in the social science community. It is free of charge, open-source, and numerous additional and freely available packages expand its functionality to a wide range of applications.

This workshop introduces the statistical programming language R. It is targeted at social scientists with no or very little prior knowledge of R. No advanced statistics skills are needed to participate, but some statistical knowledge (e.g. of linear regression) is beneficial to follow the examples.

All coursework will be done using the programming language R and the environment RStudio. Participants are kindly asked to install both on their machines beforehand. A manual on how to do this will be provided in advance. This will save us time during the class. If you have trouble installing the software, please do not hesitate to contact the lecturer before the course starts or ask during the course.

The course will start with an introduction to what packages and functions are and how to load the data into R. We will go on with data wrangling, mainly using a package collection called “tidyverse”. The course will also cover basic regression models, visualization of data, and the preparation of clean documentation in R Markdown. There will also be some time for specific demands of the participants. Each unit will consist of an introduction by the lecturer, followed by hands-on exercises.

The workshop Applied Data Visualization introduces students to the theory and methods underlying data visualization. Data analysts face an ever-increasing amount of data (→ big data) and rather revolutionary technological developments allow researchers to visually engage with data in unprecedented ways. Hence, data visualization is one of the most exciting fields in data science right now. In this workshop students acquire the skills to visualize data in R both for exploratory purposes as well as for the purpose of explanation/presentation. We'll rely on R, the most popular statistical programming environment when it comes to visualization and we'll make use of popular R packages such as Ggplot2 and Plotly. Besides creating static graphs we'll also have a look at interactive graphs and discuss how interactive visualization may revolutionize how we present data & findings.

Social science research is often faced with the problem that social phenomena (e.g., authoritarianism, anti-foreigner attitudes) are not directly observable. Such latent constructs must therefore be operationalized by means of measurement models. Structural equation modeling (SEM) is a procedure that can be used to empirically validate measurement models and to test causal relationships between latent variables.

The workshop introduces the logic of structural equation modeling and the basics of its application to empirical analyses. Participants will learn to use SEM working with the lavaan package for R on examples of the ALLBUS data.

Topics include:

- specification and estimation procedures

- confirmatory factor analysis

- path analysis

- moderator and mediator analysis

- multiple group analysis and testing for measurement equivalence

- methodological pitfalls

R is the major language for statistical programming and data analysis. In its more than 30 year history it went from an open-source implementation of its commercial predecessor S to being the staple of any data analyst’s toolbox. The versatility of R lies in its immense extendibility, something we will discuss in this workshop.

This workshop is aimed at participants who already have some experience with R, but would like to further hone their skills in this language. We will cover advanced topics in R programming, such as functional and object-oriented programming, debugging, testing, and parallelization. The workshop's ultimate goal is to provide the participants with a solid foundation of R programming, which will allow them to tackle more complex problems and design efficient work flows for data analysis.

In recent years, many researchers have renewed interest in the spatially integrated social sciences, following the call of a ‘spatial turn’ among plenty of its subdisciplines. However, to process, visualize, and analyze geospatial data, social scientists must first be trained in specialized tools called Geographic Information Systems (GIS). The good news is: While this may have been an unacquainted undertaking until recently, the familiar open-source statistical language R can now serve as a full-blown GIS for many research applications.

This course will teach its participants how to exploit R to apply its geospatial techniques in a social science context. We will learn about the most common data formats, their characteristics, and their applications. Most importantly, the course will present available data sources, get data, and process them for further analysis. These steps involve essential geospatial operations, such as cropping, aggregating, or linking data, and the first fundamental steps of modeling and assessing spatial interdependence. The course will be hands-on, so it also includes one of the most rewarding tasks of working with geospatial data: visualizing them through maps.

Creating professionally looking scientific reports and papers are recurring tasks for social scientists – and often time-consuming. This workshop introduces Quarto and the associated Markdown-based systems for generating automated reports using literate programming techniques to facilitate this process. Literate programming is the paradigm of combining code, its output, and its interpretation in a single document. For example, one can embed scripts for data analysis in R or Python in a Quarto file to generate tables and figures reproducibly. This allows to create fully reproducible, beautiful documents with little effort.

In this course, we will cover the basics of writing automated reports with Quarto, as well as academic-specific topics such as bibliography generation and collaborative writing with non-technical individuals. Additionally, we will cover how Quarto can be used to create high-quality presentations and webpages, such as a personal page or for a project.

Neben einer fokussierten Einführung in die bzw. einer Auffrischung der theoretischen und praktischen Grundlagen der multiplen linearen und binär-logistischen Regressionsanalyse behandelt der Workshop aktuelle Debatten. Letztere umfassen (1) die auf Basis von Forschungszielen angepasste Anwendung der Regressionsanalyse, mit einem Fokus auf theoriegeleitete hypothesenprüfende Forschung, (2) die Auswahl und Spezifikation von Kontrollvariablen, (3) die richtige Spezifikation und Interpretation von Interaktionseffekten in linearen und binär-logistischen Regressionsmodellen, (4) die Unterscheidung zwischen substanzieller Signifikanz und Effektgröße im Vergleich zur bloßen statistischen Signifikanz, (5) das lineare Wahrscheinlichkeitsmodell und die Betrachtung von marginalen Effekten als Alternativen zu Logit Koeffizienten und Odds Ratios im binär-logistischen Regressionsmodell. In gemeinsam Übungsaufgaben wird die praktische Umsetzung mit dem Statistikprogramm Stata und anhand aktueller sozialwissenschaftlicher Fragestellungen und Querschnittsdaten illustriert. Der Fokus liegt hier auf der zielgerichteten Durchführung und inhaltlichen Interpretation der Regressionsanalyse gemäß dem Stand aktueller Debatten. 

In recent years, more and more spatial data has become available, providing the possibility to combine otherwise unrelated data, such as survey data with contextual information, and to analyze spatial patterns and processes (e.g., spillover effects or diffusion).

Many social science research questions are spatially dependent such as voting outcomes, housing prices, protest behavior, or migration decisions. Observing an event in one region or neighborhood increases the likelihood that we observe similar processes in proximate areas. As Tobler’s first law of geography puts it: “Everything is related to everything else, but near things are more related than distant things”. This dependence can stem from spatial contagion, spatial spillovers, or common confounders. Therefore, basic assumptions of standard regression models are violated while analyzing spatial data. Spatial regression models can be used to detect this spatial dependence and explicitly model spatial relations, identifying spatial spillovers or diffusion.

The main objective of the course is the theoretical understanding and practical application of spatial regression models. This course will first give an overview on how to perform common spatial operations using spatial information, such as aggregating spatial units, calculating distances, merging spatial data as well as visualizing them. The course will further focus on the analysis of geographic data and the application of spatial regression techniques to model and analyze spatial processes, and furthermore, the course addresses several methods for defining spatial relationships. Hereby, the detection and diagnostic of spatial dependence as well as autocorrelation are demonstrated. Finally, we will discuss various spatial regression techniques to model processes, clarify the assumptions of these models, and show how they differ in their applications and interpretations.

The workshop Interactive Data Analysis with Shiny introduces participants to the basics of creating interactive apps with Shiny in R. Interactive data applications are becoming increasingly popular in academia, media, and companies to visualize, manage and analyze data. Shiny is a tool for creating such (web) applications using R code. It allows you to create interactive data apps with no knowledge of HTML, CSS, or JavaScript. The utilisation of interactive applications expands the forms of use of existing data sets and enables users to freely explore the data. The course offers an introduction to reactive programming and the R Shiny package, outlines a workflow for project management, discusses ways of offline and online hosting, and gives you the opportunity to start your own interactive data analysis project.

Die Schätzung kausaler Effekte ist eines der zentralen Anliegen der quantitativen empirischen Sozialforschung. In der Forschungspraxis stehen häufig nur nicht-experimentelle Daten zur Verfügung, die Kausalschlüsse aufgrund nicht-zufälliger Selektion erschweren. In der aktuellen sozialwissenschaftlichen empirischen Forschung finden zunehmend Methoden der modernen Kausalanalyse für nicht-experimentelle Daten Anwendung, denen ein klares Kausalitätsverständnis zugrunde liegt und die nicht-zufällige Selektion explizit adressieren. Dieser Workshop führt in diese Verfahren ein. Gemäß der theoriegeleiteten empirischen Sozialforschung wird als Ausgangspunkt die Idee kausaler Hypothesen erklärt und das Ziel der kausalen Inferenz von den alternativen Zielen der Deskription und Prädiktion abgegrenzt. Dann werden als theoretische Grundlage für alle Verfahren das kontrafaktische Modell der Kausalität und die Theorie kausaler Graphen (DAGs – Directed Acyclic Graphs) vorgestellt und anhand praktischer Beispiele eingeübt. Es wird erläutert, welche Implikationen sich daraus für die Regressionsanalyse ergeben, wie z.B. Auswahl der Kontrollvariablen, kausaltheoretischer Modellaufbau und Verfahren des Regression Adjustment. Darauf aufbauend werden in einer anwendungsorientierten Einführung die Verfahren (Propensity-Score) Matching, Entropy Balancing, Inverse Probability Weighting, Instrumentvariablenschätzer, Regression Discontinuity Design und Differenz-von-Differenzen-Schätzer vorgestellt. Die Verfahren werden praxisnah am PC mit dem Statistikprogramm Stata eingeübt. Für die Praxisbeispiele werden sozialwissenschaftliche Daten verwendet. .

Are women more likely to attain a higher education degree than men? How does party preference vary across different social groups? Are Europeans concerned about climate change? In the social sciences, one often deals with categorical dependent variables. These can be variables whose characteristics are dichotomous (e.g., attaining a degree: yes/no), nominal (party preference for CDU, SPD, FDP, or Greens), or ordinal (no worries, some worries, big worries). In this workshop, we will discuss regression models for analyzing such categorical variables. Topics include linear probability models, logistic regression models, probit and other link functions, goodness-of-fit assessment, and an introduction to ordinal logistic regression and multinomial regression models as well as presentation of results for reports and publications. The statistical concepts are introduced, applied, and deepened through hands-on sessions using the statistical software Stata. In addition, participants will conduct small research projects in which they will independently analyze their own data or ALLBUS data with categorical outcome variables in groups of two or three.

Der Workshop hat zum Ziel, die Grundideen und Strategien der Grounded-Theory-Methodologie (GTM) – eine der am weitesten verbreiteten qualitativen Forschungsmethodologien – zu vermitteln und hierbei auch unterschiedliche Positionen zur GTM vorzustellen und zu diskutieren.

Orientiert an den Fragen und dem Bedarf der Teilnehmenden werden die wesentlichen Konzepte und Schritte – u.a. theoretische Sensibilität; offenes, axiales und selektives Kodieren; theoretisches Sampling und theoretische Sättigung – behandelt. In den Übungen werden am Material der Teilnehmenden zentrale Arbeitsschritte gemeinsam erprobt und reflektiert.

Im Rahmen des Workshops soll in Theorie und Praxis qualitativer Interviews als wesentliches sozialwissenschaftliches Erhebungsverfahren eingeführt werden.

Im Zentrum des Workshops stehen zum einen der Überblick über gängige Interviewvarianten und deren Einbettung in Konzepte der Gesprächsführung und in Narrationstheorien; zudem werden Fragen des Datenschutzes, angemessener Transkription/Datenaufbereitung und Archivierung diskutiert.

Den zweiten Schwerpunkt des Workshops bilden Übungen zur Leitfadenentwicklung und Interviewführung (mit Videofeedback). Materialien (Interviewleitfäden etc.) der Teilnehmenden werden gerne berücksichtigt und besprochen.

Der Workshop gibt eine grundlegende Einführung in die qualitative Netzwerkforschung. Wir starten mit epistemologischen Grundlagen qualitativer Sozialforschung und machen anschließend mit den Herausforderungen qualitativer Netzwerkstudien vertraut. Die Teilnehmer:innen erhalten Einblick in zentrale Konzepte der Netzwerkforschung und ihre historische Entwicklung.

Über praktische Übungen werden die Teilnehmer:innen mit Methoden der Erhebung und der Auswertung von Netzwerkdaten vertraut gemacht. Sie erhalten einen grundlegenden Einblick in die Software Vennmaker. Abschließend sind sie sensibilisiert für Strategien qualitativer Netzwerkforschung und ihre Schnittstellen zur standardisierten Netzwerkforschung. Über den Workshop hinweg lernen sie zentrale Studien der Netzwerkforschung kennen. Es ist wünschenswert, dass Teilnehmende eigene Forschungsprojekte mitbringen und diese in die Diskussion einbringen.

Netzwerkforschung ermöglicht es Handlungen und Orientierungen aus ihrer strukturellen Einbettung in Beziehungsgefüge heraus zu begreifen, sie fragt beispielsweise nach der Position von Manager_innen in Organisationen und welche Handlungsspielräume sich daraus ergeben, warum manche Firmen im Markt erfolgreicher sind und andere nicht oder auch welche Beziehungen förderlich für erfolgreiche Bildungswege und subjektives Wohlbefinden sind. Die Netzwerkforschung kann zeigen wie sich soziale Beziehungen bilden, welche Effekte die Beschaffenheit von Netzwerken und die Position von Akteuren haben oder auch wie Netzwerke unsere Handlungsmöglichkeiten konstituieren. Das besondere an der qualitativen Netzwerkperspektive ist, dass sie die Deutungszuschreibungen der Akteure zum Ausgangspunkt dieser Betrachtungen macht und Netzwerke als sozial konstruiert begreift.

R is a software environment for statistical computing that is both powerful and versatile, as well as open source. It allows users to manage and manipulate data, conduct a wide range of statistical analyses, and present scientific results in various forms. However, for individuals who are new to R, the experience can be challenging as it is a programming language that operates differently from commercial statistical software packages like SPSS or Excel that primarily use graphical user interfaces.

This three-day workshop addresses researchers who have little to no prior experience with R. During this time, we will start by introducing R and the popular development environment RStudio. We will move at a slow pace, explaining the fundamental concepts of R usage, including basic programming concepts and how to use RStudio. Additionally, we will show participants how to extend R's capabilities to perform analyses using R packages. We will also cover the popular R package "tidyverse", which is useful for performing common data wrangling tasks, such as reading-in, sub-setting, and transforming data from various sources. Finally, we will use the "tidyverse" package to conduct basic exploratory data analysis and visualizations.

Throughout the workshop, participants will complete exercises that will provide them with reference material for common R programming tasks. We will also emphasize the use of online resources to help participants find answers to programming problems. By the end of the course, participants will have a solid understanding of the fundamentals of R, including how to work with tabular data, such as reading in, transforming, and analyzing data. Our goal is to equip participants with all the tools and resources they need to continue advancing their R skills on their own.

This course is an introduction to programming with Python with a special focus on data analysis and machine learning, for which the programming language is known to be particularly powerful. Through morning lectures and afternoon applied sessions, the participants will learn the fundamentals of programming as well as how to use Python as a powerful tool for data wrangling, data visualization, and data analysis. The objectives of the course are to give participants the tools that all basic programming tasks need as well as an overview of the specific topics needed to carry out analyses for multiple data types.

Die qualitative Inhaltsanalyse (QIA) ist eine Methode zur Auswertung verschiedener Datensorten, die klassischerweise für die Auswertung textförmiger Daten (z.B. Interviewtranskripte) genutzt wird. Der Auswertungsfokus liegt auf der Systematisierung manifester und (mit gewissen Grenzen) latenter Inhalte des Materials unter einer spezifischen Forschungsfrage. Das für die Systematisierung wesentliche Instrument sind Kategoriensysteme, die in der Regel aus mehreren Hauptkategorien und diese dann ausdifferenzierenden Subkategorien bestehen. Mittlerweile hat sich eine Reihe unterschiedlicher QIA-Varianten etabliert. Im Workshop wird in die qualitative Inhaltsanalyse nach Udo Kuckartz und Stefan Rädiker (2022) eingeführt und das Verfahren der inhaltlich-strukturierenden qualitativen Inhaltsanalyse in Übungen erprobt. Ein inhaltlich strukturierendes Vorgehen gilt als das qualitativ inhaltsanalytische Standardvorgehen, welches sich auch autor*innenübergreifend in verschiedenen Lehrbüchern wiederfindet. Ziel ist es, das zu analysierende Material im Hinblick auf spezifische inhaltliche Aspekte zu beschreiben. Die Variante von Kuckartz und Rädiker zeichnet sich u.a. durch verschiedene Wege der Kategorienbildung aus, auf die im Workshop eingegangen wird.

Insbesondere werden die folgenden Themen bearbeitet:

  • Merkmale und Ablauf des Verfahrens
  • Kategorienbildung und Kategorienarten
  • die Entwicklung von und Anforderungen an inhaltsanalytische Kategoriensysteme
  • Kodieren
  • Ergebnisdarstellung
  • Gütekriterien

Im Verlauf des Workshops wird dieses Verfahren in die qualitativ inhaltsanalytische Forschungslandschaft und den Diskurs um qualitative Forschung eingeordnet.

Teilnehmer*innen sind eingeladen, eigenes Material vorab einzureichen, sodass dieses für die Übungen genutzt werden kann.

Many social phenomena that we study in the social sciences follow an interaction logic. That means that the effect of an explanatory variable on an outcome differs depending on the value of a third variable. For example, the degree to which citizens are convinced by political messaging may depend on their party preference or their education.

This course will introduce students to best practices for modeling interaction effects in quantitative data and equip students with tools to visualize interaction effects using state-of-the-art graphical approaches. In detail, we will talk about how to include and interpret interaction terms in regression models, about other ways in which interaction logics can be included in regressions and about how to visualize these effects to help interpret and communicate interaction effects in the data.

The course will also deal with advanced and cutting-edge topics in modeling interactions. In interaction models, the control strategy is very important in order for the interaction of interest not to erroneously reflect the effect of other interaction terms or nonlinear effects that are omitted from the statistical model. Participants will learn intuitive as well as advanced strategies for avoiding misattribution in interaction models, the latter in the form of regularized estimators such as the adaptive Lasso.

Finally, interaction effects are not always linear. Instead, it is possible that the effect of an explanatory variable varies across the values of a moderating variable in a nonlinear, for example, a U-shaped pattern. We will learn how to model and visualize nonlinear interactions and avoid erroneously inferring a nonlinear interaction pattern when there is none.

This course will consist of a mix of lectures and hands-on computer labs, where students can apply the learned material to data on society and politics.

This workshop introduces sequence analysis for social science research. Sequence analysis, originally developed in biology to analyze strings of DNA, has attracted increasing attention in the social sciences to analyze longitudinal data. Most applications in the social sciences study life course processes, including labor market careers, transitions to adulthood, or family formation.

This workshop covers longitudinal data management (only briefly), basic techniques of sequence analysis, as well as recent methodological developments tailored to social science research questions. Topics include different ways of calculating distances between sequences, cluster analysis after sequence analysis, sequence visualization, techniques for analyzing sequences’ multidimensionality, and the association between sequences’ unfolding over time and independent variables. All methods are demonstrated with hands-on examples using R. 

Statistical power is the probability of finding an effect if there truly is one to find. Unfortunately, it is often treated as a hoop to jump through so you can appease the ethics board or a reviewer. That’s a shame, because knowing what you power for means you know what your study is all about. Power analysis lets you calculate the sample size needed for your next study or the sensitivity of your already conducted study. Attending this workshop will give you a deeper understanding of what you’re after in your empirical analyses, rather than following heuristics. What heuristics? Ideally, after this workshop, you can immediately see what’s problematic about a sentence like this:

“Following previous work (random citation), we calculated power for a two-way ANOVA. Relying on a medium effect size, we needed 26 people per group for 80% power.”

Instead of heuristics, we’ll learn how to use simulations for calculating statistical power — but far more important, you’ll learn about the thinking behind a simulation to make all aspects of your study explicit and transparent. You’ll have everything you need to run a power simulation in R after half a day. But hopefully, you’ll come away from this workshop with a good understanding of why you simulated your power a certain way.

In terms of analysis, we won’t go beyond (repeated measures) interactions (aka regressions and ANOVAs). We won’t cover more complicated designs like multilevel models—although after visiting this workshop, you’ll have all tools to do that if you want to.

The workshop focuses on reproducible research in the quantitative social and behavioral sciences. In the context of this workshop, reproducibility means that other researchers can fully understand and rerun your data preparation and statistical analyses. However, the workflows and tools covered in this workshop will also help in facilitating your own work as they allow you to automate and track analysis and reporting tasks. In addition to a conceptual introduction to the methods and key terms around reproducible research, this workshop focuses on procedures for maximizing the reproducibility of data analyses using R. After discussing essential definitions and dimensions of reproducibility, we will cover some computer literacy and project organization basics that are helpful for conducting reproducible research (e.g., folder structures, naming schemes, or command-line interfaces). After that, we will focus on version control, dependency management, and computational reproducibility. The tools we will use for that include Git and GitHub, R packages for dependency management as well as Binder, a tool to package and share reproducible and interactive analysis environments. 

Bayesian methods for inference and prediction have become widespread in the social sciences (and beyond). Over the last decades, applied Bayesian modeling has evolved from a niche methodology with high computational and software-specific entry barriers to a readily available toolbox that virtually everyone can use by running pre-implemented packages in standard statistical software on generic PCs. Although Bayesian methods are now more accessible than ever before, aspiring Bayesian practitioners may be overwhelmed by questions and choices – including, but not limited to, when and why to use Bayesian methods in applied research, how to implement and interpret Bayesian analyses, or which software to use.

This workshop is designed to help participants take these first steps. It juxtaposes frequentist and Bayesian approaches to estimation and inference, highlights the distinct characteristics and advantages of Bayesian methods, and introduces participants to the Bayesian workflow and applied modeling using the R package brms – an accessible interface to the probabilistic programming language Stan, which allows users to perform Bayesian inference with state-of-the-art algorithms by running little more than a few lines of code in R.

Have you ever searched for the right file but couldn’t find it right away? Did you ever wonder why you coded a variable a certain way? Did you ever have doubts if it’s actually legal to re-use research materials from someone else? Or did you ever think that data protection regulations are just too complex and restricting?

Working with research data can be challenging for various reasons. This workshop is designed to assist researchers in managing their research data within their research projects. Therefore, we introduce basic concepts of data organization, data cleaning, and data documentation. Moreover, we provide an insight into legal issues of data management in the social sciences, i.e., data protection regulations and intellectual property rights. These skills equip researchers to manage their data properly.

Having created and processed transparent and usable data within the project, it’s just a few steps away from safely making data re-usable for others. Re-use of research data is of high relevance in the social sciences, generally labeled as Open Science. It enables others to re-use research materials, such as data, for new research purposes as well as for replicating research findings. We thus discuss concepts and ideas for digital preservation of research data beyond the research project as well as the workflows of data sharing to foster Open Science

The wide-reaching and still growing digitalization of communication in the form of text data raises demands for international, cross-lingual comparative research. For example, large, multilingual text collections of political parties’ campaign materials or politicians’ parliamentary speeches invite cross-country comparative analysis of political behavior. Likewise, the availability of large collections of national news outlets’ coverage about internationally highly relevant topics like economic inequality, climate change, or immigration allow the comparative analysis of various national perspectives.

Fortunately, an increasing number of contributions to the (computational) social science literature present approaches to analyze multilingual text collections with text-as-data methods. In this workshop, participants will learn about these approaches and strategies for studying social science-related concepts in multilingual text collections with automated content analysis methods. Specifically, we will focus on (machine) translation, multilingual embedding and transfer learning approaches.

We will focus on aspects relevant for applying these methods to compare concepts across socio-political contexts. Through a combination of theoretical discussions and practical exercises, participants will learn how to effectively apply (neural) machine translation and multilingual embedding techniques to analyze texts quantitatively across languages. Additionally, we will delve into the underlying assumptions that motivate these approaches and practice validating cross-lingual measurements.

By the end of the workshop, participants will have a strong understanding of key concepts and approaches in the existing multilingual text analysis literature, as well as the ability to implement them in R and/or Python through hands-on exercises.

Mediation analysis has been used by social scientists for the last 50 years to explain intermediate mechanisms between an assumed cause and effect. During these years many advances in statistical mediation analyses were made, including the use of multiple mediators, models for limited dependent variables, latent variable modeling, improved standard errors, and the combination of mediation and moderation analysis. However, only very recently were the causal foundations and underlying assumptions of mediation analysis clarified. These more recent advances used potential outcomes notation and graphical causal models to illuminate the types of causal effects that can be estimated – and more importantly, which assumptions are needed to recover an unbiased causal effect. This course will briefly review the traditional approaches to mediation analysis, then review fundamental topics for causal inference, and then discuss the novel methods that fall under the rubric of “causal mediation analysis.” The causal mediation methods put the assumptions of the analysis front and center, and because the causal assumptions are often untestable, tools like sensitivity analysis become important.
The course is mostly lecture-based, but will also provide numerous opportunities to practice the studied concepts using applied data examples in R.

The workshop will provide a comprehensive methodological and practical introduction to event history analysis. Special attention will be devoted to applications in life course research being concerned with time-dynamic modeling of social processes. After clarifying basic concepts like states, time, event, and censoring, descriptive approaches like life-tables and Kaplan-Meier estimation are discussed. Both continuous-time and discrete-time methods as well as parametric and semi-parametric regression models are introduced. Accounting for time-dependent covariates and time-varying effects are major features of survival models and will be discussed in detail. In addition, the workshop will cover a series of advanced topics like statistical inference with survival methods and survey data, multi-episode data, competing risk models, multilevel survival analysis, finite mixture survival models, comparison of effects across models and groups, as well as effective visualization of model results. Substantive applications from sociological and demographic research will be used for illustration. Moreover, the software package Stata will be used throughout the workshop and hands-on exercises will help to deepen the acquired knowledge. Moreover, participants are encouraged to discuss their own work.

2023 fanden 37 onsite und online Workshops statt. Die Workshops wurden teils in englischer, teils in deutscher Sprache abgehalten und deckten einen Großteil des Themenspektrums der empirischen Sozialforschung ab.