GESIS Training

KODAQS Data Quality Academy Individual Training

With the Individual Training Program the KODAQS Academy established a free, flexible, self-paced, and self-guided learning resource for researchers who want to gain insights into data quality and how to identify, foresee, diagnose, fix, and frame potential data quality challenges. It also aims to provide researcher with knowledge of innovative tools and workflows to ensure quality-assured data handling. The program was developed for researchers at all career levels looking to advance their expertise in data quality in social science research, including doctoral candidates, early-career postdocs, statistical institutions employees, and non-academic researchers in the field of social sciences.

Key Features

  • Flexible and fully online self-paced training to acquire data quality expertise
  • Separate learning paths with focus on three data types: survey data, digital behavioral data, and linked data
  • Possiblity to easily select material suited to your interests
  • Extensive didactical materials (e.g. videos, slides, literature, scripts, code, etc.)

Registration

To access the learning resources of the Data Quality Individual Training please register through our online registration form.

For further information contact us at kodaqs(at)gesis(dot)org.
 

Register here

Training Content and Materials

The KODAQS Academy Individual Training Program builds on the structure and contents of the Academy Certificate Program and includes all online modules and asynchronous resources. The training content is divided into four main topics: Terms, concepts and frameworks of data quality, indicators and metrics of data quality, remedies & corrections for identified quality problems, and tools & workflows for procuring, linking, processing and evaluating data.

This topic covers four key areas of data quality each illustrated with case studies across different data types. 

  • Section 1: Introduction of fundamental data quality dimensions and methods like decision trees for data quality assessment, focusing on differentiating intrinsic and extrinsic data quality.
  • Section 2: Discussion of the measurement dimension of data quality, emphasizing how poor measurement decisions can lead to compromised data quality.
  • Section 3: Discussion of representation bias, where unbalanced data can lead to unfair outcomes.
  • Section 4: Focus on extrinsic data quality and bias and improvement strategies.
     

This topic is divided into four sections and focuses on the practical assessment of data quality by learning about indicators and metrics for survey data, digital behavioral data, and linked data. 

Section 1: Introductory overview, providing a summary of the Total Survey Error (TSE) framework, with a focus on how to quantify error within this framework. The introduction also encompasses ways on how to find tools in R to assess data quality. 

Section 2: Focus on practical applications of quality indicators and metrics and the application of tools to assess representation error and the validity and reliability of measurement. 

Section 3: Focus on applying quality indicators and metrics to assess measurement error.

Section 4: Best practices for documenting and communicating data quality and data quality limitations.

This topic is divided into four thematic sections and covers mitigation strategies for data quality issues in survey data from a representation and measurement perspective. It explains how to assess to what extent errors can be corrected or reduced and how this can be done. 

  • Section 1: Discussion of basics of survey design for data quality.
  • Section 2: Imputation, i.e., the use of plausible values for missing responses.
  • Section 3: Weighting, i.e., applying coefficients that increase or decrease the impact of specific responses on survey statistics to compensate for selection probabilities in sampling, reduce the impact of systematic nonresponse, and for poststratification adjustments.
  • Section 4: Introduction into comparability and harmonization for survey data.

This topic is divided into four thematic sections and covers mitigation strategies for data quality issues in digital behavioral data (DBD) from a representation and measurement perspective. 

  • Section 1: Overview of DBD sources with a focus on data quality. The introduction also lays the foundation for understanding what can and cannot be achieved with data quality mitigation strategies, and when error correction is required.
  • Section 2: Critical reflection on the interplay between tools and data quality.
  • Section 3: Focus on data donations, discussing how they can contribute to resolving some typical data collection problems.
  • Section 4: Focus on the transparent communication of data quality, highlighting best practices and scientific guidelines for transparent communication with a focus on DBD. 

This topic is divided into four thematic sections and covers mitigation strategies for data quality issues in linked survey data from a representation and measurement perspective. 

  • Section 1: Overview of mitigation strategies, highlighting best practices.
  • Section 2: Imputation, i.e., the use of plausible values for missing responses to enable analysis methods on a complete dataset.
  • Section 3: Weighting, i.e., applying coefficients that increase or decrease the impact of specific responses on survey statistics to compensate for selection probabilities in sampling, reduce the impact of systematic nonresponse, and for poststratification adjustments.
  • Section 4: Introduction to comparability and harmonization for combining multiple data sources. 

This topic is divided into four sections and focuses on modern tools and workflows for assessing data quality. It covers a range of tools that can be used to ensure that analyses are reusable, reproducible, and well documented. 

  • Section 1: Introduction to the basics of transparent and robust research practices.
  • Section 2: The use of tools for transparent and reproducible data processing and analysis.
  • Section 3: Introduction to specific tools to document mitigation strategies in the measurement process.
  • Section 4: Discussion of transparency, reproducibility, and ethical considerations.

This topic focuses on detecting and handling quality issues in the data collected from survey research. The primary focus is the quality of responses to Likert scale questions. The course gives an overview of response quality issues in survey research and indicators to diagnose and deal with quality issues.

This topic is divided into eight thematic sections and covers the full dataset creation pipeline typical for research with digital behaviour data, discussing aspects of data quality at each stage. It offers very practical insights and best practices for carefully designing and implementing a data collection pipeline. 

  • Section 1: Introduction to APIs and how they influence data quality
  • Section 2: Automated interactions with APIs
  • Section 3: The impact of basic text pre-processing on data quality
  • Section 4: Basic workflow for natural language processing
  • Section 5: Comparing the quality of tools for opinion analyses
  • Section 6: Collecting annotations - Introduction and “in-house” solutions
  • Section 7: Collecting annotations - Crowdsourcing
  • Section 8: Workflows for assessing the quality of (crowd) annotations
     

This topic is divided into eight thematic sections and provides a practical introduction to record linkage. It starts with an overview of the data linkage process including the steps of preprocessing, reduction of the search space, comparison, and classification into links and non-links. It also discusses survey harmonization techniques, linkage consent, and privacy-preserving record linkage. The topic covers the traditional case of linking several surveys as well as linking surveys with non-traditional data types such as social media and geospatial data.

  • Section 1: The data matching process
  • Section 2: Linkage consent, ethics, and privacy
  • Section 3: Preprocessing & indexing
  • Section 4: Comparison methods & classification
  • Section 5: Match evaluation
  • Section 6: Matching beyond survey data
  • Section 7: Privacy-preserving data matching
  • Section 8: Harmonizing survey questions on the example of political interest survey questions

Get in touch