Combining Manual and Computational Text and Content Analysis

Organizers: Cornelius Puschmann (HIIG) & Kashif Rasul (FU Berlin)

Abstract:

Computational approaches to content analysis are increasingly popular in many areas of social science, including media and communication research, education, political science and sociology. Textual data, both from traditional written sources and from social media, offer a wealth of knowledge that social scientists may want to explore, from journalistic framing in news and the evolution of scientific concepts in scholarly papers to language games and political debate on Facebook and Twitter. As part of the computational approach we will give an introduction to one of the most important aspects of computational social sciences, namely Natural Language Processing (NLP). NLP is ubiquitous because people communicate everything in language from web searches to emails and forum entries, as well as social media.

Recently, deep learning approaches to NLP have been shown to obtain very high performance across many NLP task. We will consider word-vector representations together with recurrent neural networks for this task.
Despite these recent developments in NLP, computational analysis of textual data is overall also subject to certain pitfalls. Both data and methods need to be rigorously scrutinized for weaknesses such as coding errors and overfitting, and understood intimately in a qualitative manner before they can be interpreted at scale. Analyses are in danger of overinterpretation, rationalizing patterns which may reflect technical errors or chance fluctuations. Even more than the analysis of numerical metadata, text mining risks producing analyses which are unfalsifiable and unreproducible.

Our workshop will aim to strengthen the connection between manual and computational approaches to content analysis by doing the following:

● providing an critical overview of both off-the-shelf and experimental techniques for content analysis

● discussing how to train, debug and visualise NLP models applying a range of techniques to different data

● describing approaches that integrate manual and computational techniques for content analysis

● highlighting strengths and weaknesses of different procedures systematically

● critically discussing the methodology of textual analysis (formulating questions, selecting data, applying methods, interpreting results) in the broader context of computational social science

Organizational Details:

Target audience: Graduates in the social sciences and in computer science who wish to integrate manual and computational approaches to content analysis in their work.
Date and time: Dec 1^st, 09:00-11:15
Location: GESIS Cologne, Room Ost
Contact:
Cornelius Puschmann, Alexander von Humboldt Institute for Internet and Society (HIIG), cornelius.puschmann(at)hiig(dot)de
Kashif Rasul, Department of Mathematics and Computer Science, Freie Universität Berlin, kashif(at)zedat.fu-berlin(dot)de