* 2016_03-18_cross_eu_silc_hhld_reg_d.do * * STATA Command Syntax File * Stata 15.1; * * Transforms the EU-SILC CSV-data (as released by Eurostat) into a Stata systemfile * * EU-SILC - Cross 2016 Version March 2018 * Household register file: * This version of the EU-SILC has been delivered in form of seperate country files. * The following do-file transforms the raw data into a single Stata file using all available country files. * Country files are delivered in the format UDB_c*country_stub*16D.csv * * * PLEASE NOTE * For Differences between data as described in the guidelines * and the anonymised user database as well as country specific anonymisation measures see: * C-2016 DIFFERENCES BETWEEN DATA COLLECTED.doc * * (c) GESIS 2018-02-26 * GESIS - Leibniz Institute for the Social Sciences * German Microdata Lab * Valentina Ponomarenko * https://www.gesis.org/gml/european-microdata/eu-silc/ * * Contact: valentina.ponomarenko@gesis.org /* Initialization commands */ clear capture log close set more off version 15.1 set linesize 250 set varabbrev off #delimit ; * - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ; * CONFIGURATION SECTION - Start ; * The following command should contain the complete path and * name of the Stata log file. * Change LOG_FILENAME to your filename ; local log_file "LOG_FILENAME" ; * The following command should contain the complete path where the CSV data files are stored * Change CSV_PATH to your file path (e.g.: C:/EU-SILC/Crossectional 2004-2016) * Use forward slashes and keep path structure as delivered by Eurostat CSV_PATH/COUNTRY/YEAR ; global csv_path "CSV_PATH" ; * The following command should contain the complete path and * name of the STATA file, usual file extension "dta". * Change STATA_FILENAME to your final filename ; local stata_file "STATA_FILENAME" ; * CONFIGURATION SECTION - End ; * There should be probably nothing to change below this line ; * - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ; * Loop to open and convert csv files into one dta file ; tempfile temp ; save `temp', emptyok ; foreach CC in AT BE BG CH CY CZ DE DK EE EL ES FI FR HR HU IE IT LT LU LV MT NL NO PL PT RO RS SE SI SK UK { ; cd "$csv_path/`CC'/2016" ; import delimited using "UDB_c`CC'16D.csv", case(preserve) clear ; * In DE, NL, PT and RS DB040 is missing and read as numeric. * To prevent errors in the append command, it needs to be set to string ; tostring DB040, replace ; append using `temp', force ; save `temp', replace ; } ; * No info on region is available for DE, NL, PT, RS ; replace DB040="no info" if DB040=="." ; * Countries in data file are sorted in alphanumeric order ; sort DB020 ; log using "`log_file'", replace text ; * Note that some variables in the csv-data file might be in lowercase * To ensure that the dataset contains only variable names in uppercase ; foreach var of varlist _all { ; local newname = upper("`var'") ; cap rename `var' `newname' ; } ; * Definition of variable labels ; label variable DB010 "Year of the survey" ; label variable DB020 "Country alphanumeric" ; label variable DB030 "Household ID" ; label variable DB040 "Region" ; label variable DB040_F "Flag" ; label variable DB060 "PSU-1 (First stage)" ; label variable DB060_F "Flag" ; label variable DB062 "PSU-2 (Second stage)" ; label variable DB062_F "Flag" ; label variable DB070 "Order of selection of PSU" ; label variable DB070_F "Flag" ; label variable DB075 "Rotational Group" ; label variable DB075_F "Flag" ; label variable DB090 "Household cross-sectional weight" ; label variable DB090_F "Flag" ; label variable DB100 "Degree of urbanisation (EE, LV:1,2 = 1; MT:2,3 = 2) " ; label variable DB100_F "Flag" ; * Definition of category labels ; label define DB040_F_VALUE_LABELS -1 "missing" 1 "filled according to NUTS-13" ; label define DB060_F_VALUE_LABELS 1 "Rotation is implemented at PSU level (the PSU rotates in and out of the sample" 2 "Rotation is implemented at SSU or household level (The PSU remains in the sample for the entire duration of EU-SILC" -2 "not applicable" ; label define DB070_F_VALUE_LABELS 1 "filled" -2 "not applicable" 12 "order on sampl.frame is fixed for all EU-SILC survey years; PSUs have an unequal probability of selection (within explicit strata)" 21 "order on sampl.frame may change over time/PSUs have equal probability of selection (within explicit strata)" 22 "order on sampl.frame may change over time/PSUs have unqual probability of selection (within explicit strata)" ; label define DB075_F_VALUE_LABELS 1 "filled" -2 "na (no rotational design is used)" ; label define DB090_F_VALUE_LABELS 1 "filled" -7 "not applicable: DB010 not equal last year" ; label define DB100_VALUE_LABELS 1 "densely populated area" 2 "intermediate area" 3 "thinly populated area" ; label define DB100_F_VALUE_LABELS 1 "filled" -1 "missing" ; * Attachement of category labels to variable ; label values DB040_F DB040_F_VALUE_LABELS ; label values DB060_F DB062_F DB060_F_VALUE_LABELS ; label values DB070_F DB070_F_VALUE_LABELS ; label values DB075_F DB075_F_VALUE_LABELS ; label values DB090_F DB090_F_VALUE_LABELS ; label values DB100 DB100_VALUE_LABELS ; label values DB100_F DB100_F_VALUE_LABELS ; compress ; save "`stata_file'", replace ; log close ; set more on #delimit cr