Week 2 (10-14 October 2022)
Lecturers: Shelmith Kariuki; Ahmadou Dicko, PhD
About the lecturers
Shelmith Nyagathiri Kariuki is a Senior Data Analyst based in Nairobi, Kenya. She recently worked as a Research Manager at Geopoll, and as a Data Analyst at Busara . Shelmith has previously worked as an assistant lecturer in various Kenyan universities, teaching units in Statistics and Actuarial Science. She holds a Bsc in Actuarial Science and Msc in Applied Statistics from JKUAT. Shelmith has extensive experience in data analysis using R and Python and is at the forefront of the AfricaR initiative, striving to achieve improved representation of Africans in the global R community by encouraging, inspiring, and empowering Africans of all genders.
Ahmadou Dicko is a trained statistician and holds a PhD in climate change economics. He specializes in the use of statistics and data science in development and humanitarian projects. He was the lead of the OCHA Center for Humanitarian Data team for West and Central Africa and his previous experiences also include the use of data-driven approaches to optimize the control tropical diseases and to monitor food security in West Africa. Ahmadou is a statistical consultant and a certified Rstudio trainer. He has worked with several organizations such as FAO, IAEA, OCHA, CIRAD and various governments in Africa.
Short course description
The course “Introduction to Data Analysis Using R” introduces participants to basic methods of data analysis with the open-source statistical software R. This comprises an overview of how to import and handle data in R, basic uni- and bivariate statistical analysis, as well as multivariate statistical analysis (e.g. regression modelling).
The course will cover a) basics of statistical data analysis; b) importing data into R; c) preparing the data for analysis; d) conducting uni- and bivariate analysis (e.g., describing distributions and associations); e) conducting multivariate analysis (e.g., regression models); f) creating tables and graphs; and g) integrating R into a workflow of data analysis.
Each day will take six hours of classroom instructions, combining lectures in which the theoretical foundations of the statistical models are discussed and with hands-on exercises, giving participants the opportunity to work with real data.
The course is targeted at researchers and practitioners with a basic knowledge of statistics, who work with survey data and want to use R as a tool for data analysis. In the course participants will develop a) familiarity with R’s interface and facilities; b) an understanding of how to integrate R into their research projects; c) the skills to handle common data management problems; d) the skills to conduct uni- and multivariate analysis with R; and e) the skills to present their results using tables and graphs.
Refresher on statistical data analysis
Introduction to R, importing data, preparing and manipulating data
Uni- and bivariate statistics; using graphs to explore data
Statistical Inference (statistical tests, regression)
Reproducible Research in R
A full-length syllabus of this course will be available here closer to the course date.
Prior experience with data analysis, basic statistics, and regression. The participants should have already attended an introductory event in statistics. Experience in dealing with other statistics packages is helpful, but not a requirement. Basic familiarity with the use of a computer.
This course is for people that work with data and want to use R as their first programming language or as an additional tool. Participants will find the course useful to get an overview of the possibilities in using R.
Course and learning objectives
The workshop provides a hands-on introduction to R and lays the foundations for independently developing your skills in dealing with the programming language R. The participants can expect to receive an overview of the functional scope of R, master the import and export of data, and how to perform basic data analysis in R.
Hardware and software requirements
Please bring your own laptops for use in the course and install the following R packages: knitr, Rcmdr, lme4, devtools, ctv, readxl, lattice, xlsx, tibble, haven, foreign, readstata13, org, rio, Hmisc, naniar, memisc, tidyverse, forcats, car, reshape, DT, dplyr, magrittr, jtools, Metrics, visreg, AmesHousing, corrplot, DAAG, caret, stargazer, faraway, ggplot2, pscl, MASS, arm, survey, svyPVpack, mlmRev, vioplot, beanplot, psych, AER, igraph, ggraph, plotly, kknn, maps, dbscan, rvest, rtweet, gapminder, mlbench, purrr, compare, ggmap, leaflet, maptools, raster, rgdal, colorRamps, sp, osmplotr, osmdata, tmap, io, kableExtra, tmaptools before the course.
- Chang, W. (2013). Cookbook for R. http://www.cookbook-r.com/
- Ismay, C. & Kim, A. Y. (2019). An Introduction to Statistical and Data Sciences via R. https://moderndive.com/
- Teetor, P. (2011). R cookbook: Proven recipes for data analysis, statistics, and graphics. O'Reilly Media, Inc.
- R for Data Science