Quantitative Text Analysis Using R

Organizers: Kenneth Benoit (London School of Economics) and Paul Nulty (London School of Economics)

Abstract:

This is a short workshop on how to use the R statistical environment for analyzing text and producing quantitative analyses of them. Our focus will be on the quanteda package, which we developed as a toolkit for powerful and simple to use quantitative analysis of textual data.
Quanteda makes it easy to manage texts in the form of a corpus, defined as a collection of texts that includes document-level variables specific to each text, as well as meta-data for documents and for the collection as a whole. Quanteda includes tools to make it easy and fast to manuipulate the texts in a corpus, by performing the most common natural language processing tasks simply and quickly, such as tokenizing, stemming, or forming ngrams. quanteda’s functions for tokenizing texts and forming multiple tokenized documents into a document-feature matrix are both extremely fast and extremely simple to use.
Quanteda can segment texts easily by words, paragraphs, sentences, or even user-supplied delimiters and tags.

Organizational Details:

Date, time & Location:

  • Course 1: Dec 1st, 13:00-15:00, GESIS Cologne, Room West I
  • Course 2: Dec 1st, 16:00-18:00, GESIS Cologne, Schulungsraum

Contact: Kenneth Benoit (London School of Economics) and Paul Nulty (London School of Economics)