The Manifesto Corpus - a Digital Corpus of Electoral Programs

Nicolas Merz
01.12.2016, GESIS Mannheim (B2,8 rechts), 13:45

Abstract
The Manifesto Corpus is a digital, open-access, multilingual, annotated corpus of electoral programs. It complements the recent methodological innovations in (semi-) computerized content analysis by providing a large, standardized text corpus for the political science community. The corpus is based on the collection of the Manifesto Project, which comprises the largest hand-annotated text corpus of electoral programs available. Since 2009 the project’s costly and time-intensive procedure of collecting and coding documents has been fully digitalized. As a result, it now provides more than 1900 machine readable documents from 40 different countries. Eight hundred of these documents contain content-analyzed annotations at the level of single (quasi-) sentences, which correspond to the Manifesto Project coding scheme. Additionally, the corpus is continually being extended by incorporating new elections and digitalizing older documents. The corpus is stored and versioned in a standardized format in an online database. A web API, an R package (manifestoR) and a stata add-on (manifestata) guarantee easy access and the replicability of analyses using the corpus data. Jirka Lewandowski, Sven Regel, Nicolas Merz and Pola Lehmann won the 2016 Statistical Software Award by the Society for Political Methodology for the R package manifestoR and the Manifesto Corpus. The talk will present illustrative applications of the corpus, as well as the technical infrastructure used to maintain, distribute and access the corpus.

Zur Person
Nicolas Merz is a research fellow in the project Manifesto Research on Political Representation (MARPOR) at the WZB Berlin Social Science Center. Additionally, he is a PhD student at Humboldt University working on a dissertation on electoral programs and media coverage. His research interests include political parties, electoral programs, political communication and content analysis. Jirka Lewandowski is Assistant for Data Analysis and Infrastructure in the MARPOR project at WZB Berlin Social Science Center. He developed and maintains the technical infrastructure for storing the Manifesto Corpus, as well as for the manifestoR software package. His interests are computer-based text analysis and open, reproducible scientific work and data flows.