Regional Context Data with the Microcensus

Small-scale context data are relevant influencing factors for many social science questions. A prerequisite for the use of spatial context data for most questions is that spatial context data can be linked to data at the individual level. In order to link context data to individual data, spatial identifiers must be available at the individual level. In addition, context data with low measurement error are required at the smallest possible spatial level. In Germany, the microcensus is particularly suitable as a source for context data, since a broad set of indicators can be derived, which are in principle available at the municipality level and, with the introduction of household geo-coordinates, potentially at a smaller level.

With this in mind, the purpose of this project was to show how spatial continuous regional context data can be estimated from the Microcensus that can be linked to other data sources at the coordinate level. Using the Microcensus Regionalfile 2000, it was possible to estimate large-area continuous distributions with satisfactory sampling error at the coordinate level, with a special spatial kernel density estimation procedure (Groß et al. 2017). Since the regional file only contains information at the Microcensus county region level and it is only a subsample, an implementation of the procedure with the entire Microcensus 2015 in onsite access was attempted in the second step. The implementation was discontinued due to linking problems of the municipality shapefiles to municipalities and lack of coverage of the microcensus at the municipality level. Finally, the 2011 Census and 2011 Microcensus SUF data were used to examine the spatial level at which spatial variation in various sociodemographic variables comes into play. From this, it became apparent that for most of the variables considered, spatial variation occurs below the municipality level, i.e., aggregation at the municipality level loses this information.

Thus, the Microcensus is currently not suitable for estimating spatial contextual data. This may improve in principle when the geo-coordinates collected since 2018 are published. However, since the Microcensus is a cluster sample of households from randomly drawn “Auswahlbezirk” units, the additional information of the exact location of each household may not be a significant gain, since there is not much spatial variation at the level of households in a selection district.

Groß, M.; Rendtel, U.; Schmid, T.; Schmon, S.; Tzavidis, N. (2017): Estimating the density of ethnic minorities and aged people in Berlin: Multivariate kernel density estimation applied to sensitive georeferenced administrative data protected via measurement error. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(1), 161–183. https://doi.org/10.1111/rssa.12179

Pforr, K. (2021): Regionale Kontextdaten mit dem Mikrozensus. GESIS Papers 2021/02. Köln: GESIS - Leibniz-Institut für Sozialwissenschaften. doi: http://dx.doi.org/10.21241/ssoar.71319.