It is well-known that survey data are plagued with non-substantive variation arising from a myriad of sources such as response styles, socially desirable responding, failure to understand questions and even faked and/or duplicated interviews. In general, one can say that all data contain both substantive and non-substantive variation. Modifying Box’s (1987) famous quote that "[essentially] all models are wrong, but some are useful", I suggest that "all data are dirty, but nevertheless some are informative". But what is "dirty" or "poor" data? The guiding rule is that the lower the amount of substantive variation, the poorer is the quality of the data. Applying scaling techniques, I like to give some examples for assessing the quality of the data, i.e. for detecting non-substantive sources of variation.
There are three sources of "poor" data: the respondents, the interviewers, and the institutes collecting the data. In the lecture, I will introduce to the theory of "simplification" (Blasius and Thiessen 2012), which can be understood as an extension of Krosnick’s theory on "satisficing", including the interviewers as well as the institutes as sources for low data quality. As empirical examples, among others, I will use the PISA data as well as the European Social Survey.