GESIS Leibniz Institute for the Social Sciences: Go to homepage


The GESIS Computational Social Science (CSS) Seminar is an English (bi-)monthly event for expert exchange on data science and social analytics.

Upcoming Events

Upcoming talks will be announced soon.

Past Events

Acceptance of Smartphone App and Sensor Data Collection

Smartphones have recently gained popularity as tools for data collection in the social sciences. While smartphones can be used to administer app- or browser-based surveys, they also allow researchers to passively collect behavioral data from the operating system and built-in sensors,  such as data about Internet use, mobility patterns, and proximity to other smart devices. Compared to self-reports of behaviors, the passively collected data are more detailed and potentially more accurate, being less susceptible to recall error and social desirability. A major challenge of app- and sensor-based studies, however, is that participation rates in the general population are rather low. In this talk, Alexander Wenz will give an overview of the factors affecting individuals’ willingness to participate in smartphone-based data collection.  He will also present the results from a series of experiments aimed at increasing willingness to participate by modifying features of the study design, such as the onboarding process and the incentive structure.


Alexander Wenz is a Research Fellow at the Mannheim Centre for European Social Research, University of Mannheim. He holds a Ph.D. in Survey Methodology from the University of Essex. His research examines the quality of novel methods of data collection, with a focus on mobile web surveys, smartphone apps, wearable sensors, and digital behavioral data.

Community-Based Fact-Checking on Twitter

Misinformation undermines the credibility of social media and poses significant threats to modern societies. As a countermeasure, Twitter has recently introduced “Community Notes,” a community-driven approach to address misinformation on its platform. This talk will provide an overview of how users interact with this new feature. Our research shows that users perceive a relatively high share of community-created fact checks as informative and helpful. Furthermore, we find that it is crucial that users link to trustworthy external sources in their fact-checks to underpin their judgments. Despite showing promising potential, our analysis also suggests that community-based fact-checking faces challenges in reaching consensus for influential user accounts. Additionally, we provide insights into the spread of misleading vs. not misleading community fact-checked posts on Twitter. Different from earlier studies analyzing the spread of misinformation listed on third-party fact-checking websites (e.g.,, we find that community fact-checked misleading posts are less viral than not misleading posts. Our findings are relevant for future misinformation studies and attempts to implement community-based approaches to combat misinformation on social media.


Nicolas Pröllochs is a Tenure-Track Professor of Data Science at the University of Giessen. Prior to that, he obtained his PhD in Information Systems at the University of Freiburg and worked as a postdoctoral researcher in machine learning at the University of Oxford. His research focuses on data science methods for understanding and predicting human behavior on digital platforms. Current research projects leverage data science to study user behavior in a broad selection of areas, including social networks, online media, and digital communication.

When liars are considered honest: competing conceptions of truth and honesty

In today’s polarized world partisans find it increasingly difficult to agree on a shared body of facts. Here we argue that even the very notions of honesty and truth have fractured into two distinct constructs: one focusing on authenticity and belief speaking, and one focused on evidence and truth seeking. We analyze tweets by all members of both houses of Congress from 2012 onward to examine the prevalence of belief speaking and truth seeking. We find a robust and strong correlation between a higher share of belief-speaking and tweets with links to low-quality information sites, especially among members of the Republican party. We find some evidence for a weaker correlation between a high share of truth-seeking tweets and links to high-quality information sites. We find that the content posted on the linked sites themselves reproduces the correlation between a high proportion of belief-speaking words and low information quality. We suggest that the increasing prevalence of misinformation is in part driven by a new ontology of truth that prioritizes belief speaking over truth seeking and is fuelled by public-facing speech of elected officials.


Professor Stephan Lewandowsky is a cognitive scientist at the University of Bristol whose main interest is in the pressure points between the architecture of online information technologies and human cognition, and the consequences for democracy that arise from those pressure points.

The (individual-level) effects of data-driven campaigning

Nowadays, by means of garnering and analyzing electorates' personal data on social media, modern political campaigns can identify groups of audiences and thus disseminate tailored ads to specific segments of the public. This campaigning practice is labeled data-driven. Whilst data-driven strategies are allegedly prevalent in political campaigns, evidence regarding their actual effectiveness is scarce. This project focuses on the effects of data-driven campaigning on voters’ immediate and behavioral responses. Various methods were applied to disentangle the causal mechanisms. Specifically, a browser tracking study was combined with a 4-wave panel survey to investigate how online political ads affect voting behaviors. A mobile experience sampling method with an event-contingent sampling design was combined with content analysis and surveys to examine the impact of issue congruency on immediate responses. The results showed that data-driven campaigning has a small but positive impact on voters’ ad perception, party evaluation, and vote choice. The methodological approach also offered an innovative and plausible way to study individuals’ responses to media exposure in the hybrid media landscape.


Xiaotong Chu is a Ph.D. candidate working hybridly at the Amsterdam School of Communication Research (ASCoR) of the University of Amsterdam, and at the Strategic Communication Group (COM) of Wageningen University & Research. Her research focuses on the effects of targeting practices in modern electoral campaigns on individual-level responses. She has experience in quantitative research methods, such as (automatic) content analysis, experiments, surveys, and experience sampling method.

Out of One, Many: Using Language Models to Simulate Human Samples

We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the “algorithmic bias” within one such tool– the GPT-3 language model– is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property algorithmic fidelity and explore its extent in GPT-3. We create “silicon samples” by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.


Lisa Argyle is an Assistant Professor of Political Science at Brigham Young University, and currently spending a year as a visiting research fellow at Washington University in St. Louis.  She earned a Ph.D. from the University of California, Santa Barbara, in 2016. Lisa blends political psychology with computational social science to study political attitudes and participation in the United States. Her work has been published in both top political science journal outlets and interdisciplinary venues such as PNAS. In addition to pioneering the role of large language models in social science research, she uses surveys, experiments, and text analysis in her work.  Her primary focus is on how people talk about politics with each other in the course of their everyday lives, and she is currently working on a book project about interpersonal political persuasion.

Smart surveys

Smart surveys employ the potential of smart devices such as computing power, local data storage, sensor measurements, and linkage of public and personal online data. The main motivations for smart surveys are reduction of respondent burden, improvement of data quality and more accurate proxy measures of the statistical concepts of interest. Smart surveys form a bridge to big data and administrative data, but still treat the respondents as central persons in data collection.

In the presentation, the speaker will describe the various smart survey projects at Statistics Netherlands, and the larger European Statistical System (ESS) context in which they take place.


Barry Schouten is Senior Methodologist at Statistics Netherlands and Researcher with a secondment to the University of Utrecht at the Faculty of Social Sciences. After a Ph.D. in statistics he started as a trainee at Statistics Netherlands where in 2002 he received a position as junior methodologist at the Department of Methodology. In 2017 he became a professor by special appointment at Utrecht University. His main research interests are in non-response adjustment, non-response reduction, mixed-mode survey designs, adaptive and responsive survey designs, and smart surveys. He has a part-time assignment at GESIS for one year (up to August 2023).

Roadmap to universal hate speech detection

An increasing propagation of hate speech has been detected on social media platforms (e.g., Twitter) where (pseudo-)anonymity enables people to target others without being recognized or easily traced. While this societal issue has attracted many studies in the NLP community, it comes with three important challenges. Hate speech detection models should be fair, work on every language, and consider the whole context (e.g., imagery). Solving these challenges will revolutionize the field of hate speech detection and help on creating a "universal" model. In this talk, the speaker will present her contributions in this area along with her takes for future directions.


Debora Nozza is an Assistant Professor in Computing Sciences at Bocconi University. Her research interests mainly focus on Natural Language Processing, specifically on the detection and counter-acting of hate speech and algorithmic bias on Social Media data in multilingual context. She was recently awarded for her project MONICA, which will focus on monitoring coverage, attitudes, and accessibility of Italian measures in response to COVID-19.

When Small Decisions Have Big Impact: The Hidden Consequences of Algorithmic Profiling in Public Administration

Algorithmic profiling is increasingly used in public administration to support the allocation of limited resources. For example, in criminal justice systems algorithms inform the allocation of intervention and supervision resources, child protection services use algorithms to target risky cases and to allocate resources such as home inspections to identify and control health hazards, immigration and border control use algorithms to filter and sort applicants seeking residence in the country, and Public Employment Services use algorithms to identify job-seekers who may find it difficult to resume work and to allocate support programs to them. However, concerns are raised that profiling tools may suggest unfair decisions and thereby cause (unintended) discrimination. To date, empirical evaluations of such potential side-effects are rare. Using algorithm-driven profiling of jobseekers as an empirical example, I illustrate how different modeling decisions in a typical data science pipeline may have very different fairness implications. I highlight how fairness audits, statistical techniques as well as social science methodology can help to identify and mitigate biases and argue that a joint effort is needed to promote fairness in algorithmic profiling.


Ruben Bach is a computational social scientist at the Mannheim Centre for European Social Research at the University of Mannheim and an assistant research professor at the University of Maryland, College Park. Trained in the social sciences (Ph.D. in Sociology from University of Mannheim, 2018), he combines social science domain knowledge with methods borrowed from data science. His current work investigates biases and fairness issues in statistical profiling and AI systems as well as individuals’ perceptions of these systems. Moreover, he works on understanding online behavior, news consumption, and political communication in an increasingly polarized world.

Using computational linguistics to study (violent) extremism: challenges and opportunities

In recent years, extremism is increasingly studied by means of computational linguistics, in which large-scale text analyses serve to shed light on, for example, the characteristics of terrorist writings or the development of extremist language use over time. Researchers and practitioners in this field are also increasingly interested in the nexus between language and violent extremist behaviour: what does someone’s language use say about their (violent) intentions? In this talk, I will discuss various recent projects aimed at answering this question, including the development of a custom psycholinguistic dictionary. Finally, I will reflect on the possibilities and pitfalls associated with using large-scale linguistic data to understand (and potentially predict) violent extremist behaviour.


Isabelle van der Vegt has a background in psychology and linguistics. She obtained her PhD in Security and Crime Science from University College London. Her research focuses on the use of computational linguistics to understand extremism and threats of violence. Following her PhD, she worked for the research institute of the Dutch Ministry of Justice and Security. As of august 2022, she is an assistant professor with the Utrecht University department of sociology.

The price of polarization in open peer-production

Online peer-production projects typically incorporate information infrastructures designed to support social evaluation so that participants can assess and appraise the contributions of others. Evaluation can be either positive (expressing agreement among participants) or negative (expressing disagreement) giving rise to signed event networks connecting participants through their contributions. How does the structure of this emergent network of positive and negative events affect the quality of teamwork in open peer-production?

We address this question in an analysis of the entire production history of more than 10,000 articles in Wikipedia - one of the largest and most successful examples of peer production currently in existence. We assume that two participants consider each other as "friends" if they are strongly linked by positive relational events (agreement) and consider each other as "enemies" if there is strong negative interaction (disagreement) among them. Balance theory predicts that Wikipedia contributors, for instance, tend to agree with the enemies of their enemies and disagree with the enemies of their friends. It is well known that perfect agreement with the predictions of balance theory gives rise to polarized networks partitioning into two factions mutually fighting each other. 

However, different teams (i.e., groups of participants contributing to the same encyclopedic article) act in varying agreement with the predictions of balance theory and we hypothesize that this variation in the polarization of collaboration networks explains part of the variation in quality of the resulting articles.

In a comparison of the network mechanisms underlying the production of about 5,000 high-quality articles with the network mechanisms in a contrasting sample of comparable articles of lower quality, we find that contributors to high-quality articles display weaker tendencies to conform to the behavioral predictions of balance theory. This result supports the "price of polarization" hypothesis claiming that polarization of teams in open peer-production decreases the quality of the artefacts produced by them.


Jürgen Lerner has a diploma in Mathematics and a PhD in Computer Science from the University of Konstanz, where he heads the DFG-funded project "Statistical analysis of time-stamped multi-actor events in social networks". His general research areas are social network analysis and computational social science and his work is characterized by tight collaboration with social scientists. His research interests include statistical network modeling, networks of relational events, networks with positive and negative ties, analysis of open-peer production, and personal network analysis. Jürgen Lerner developed or contributed to various network analysis software, including visone and eventnet. He regularly teaches courses which are attended by students of computer science and social science alike. Currently he is interim professor for Computational Social Sciences and Humanities at the RWTH Aachen.

Data donations - promises and pitfalls of collaborative data collections

The collection of digital trace data has become one key endeavor for communication science research in recent years. They have the potential to revolutionize our understanding of social processes beyond (often lacking) self-reported measures of media usage and online behavior – while also suffering from their own f issues with validity, ethics, and privacy. Two of the main challenges are the accessibility and representativeness of the data. Large datasets of digital traces in various domains exist, but they are owned by the platforms that generated them (e.g. Alphabet, Meta). Direct cooperation with platforms remain challenging and may further existing inequalities in access to data.  For these reasons, it seems promising to collect data directly in cooperation with respondents using data donation techniques or other collaborative data collection approaches. In this way, users are at the center of data collection, allowing informed consent and increased agency over their own digital traces – a prerequisite for trust in scientific institutions and research projects. Also, this approach enables research designs that include traces and other methods such as surveys, interviews, or experiments. During the talk, the Open Science Data Donation Framework (OSD2F) will be presented, showing an easy way for researchers to implement an interactive, client-side platform for collecting digital trace data across different services. Experiences from using data donations during several research projects are shared, giving some further insights into collecting data in collaboration with users.


Felicia Loecherbach is currently a PhD Candidate at the department of Communication Science at the Vrije Universiteit Amsterdam and an incoming postdoc at the Center for Social Media and Politics at the NYU. Before joining the Department at the Vrije Universiteit, she received a research master degree from the Amsterdam School of Communication Research (2018) and a Bachelor’s degree in Communication Science and Philosophy at the University of Erfurt (2016).

Her research interests include (the diversity of) online news consumption and using computational methods in the social sciences. She is motivated by the impact that changes in online environments have on the understanding and usage of news. Specifically, she uses computational approaches to study when and where users come across different types of news – collecting digital trace data via innovative approaches such as data donations, analyzing different dimensions of diversity of the content and how it affects perceptions and attitudes of users. Apart from this, she has been involved in studying the challenges of different modes of news access, for example via news recommender systems and smart assistants.

She is part of the Computational Social Science Amsterdam group. Ouside of her own research, she is the early career representative of the Computational Methods Division of the ICA and involved in teaching computational methods to social scientists.

The Politus Project: How to Use AI to Obtain Representative Public Opinion from the Social Media

The Politus Project aims at scaling up traditional survey polls for public opinion research with AI-based social data analytics. Politus develops an AI-based innovation that combines quantitative and computational methods in order to create a data platform that delivers representative, valid, instant, real-time, multi-country, and multi-language panel data on key political and social trends. The project will collect content information from Twitter and process it with AI tools to generate a large set of indicators on political and social trends through its data platform. The deep learning models and NLP tools will be designed from the ground up as language-independent and generalizable systems. The platform will deliver geolocated hourly panel data on demography, ideology, topics, values, and beliefs, behavior, sentiment, emotion, attitudes, and stance of users aggregated at the district level. The Politus Project has been funded by ERC (Proof-of-Concept) and by the Scientific and Technological Research Council of Turkey (TÜBİTAK) and it will extend the technological and scientific scope of the ERC-funded Emerging Welfare Project. In this seminar, Dr. Yörük and Dr. Hürriyetoğlu will describe the general methodology of the project, including data collection, data analysis, and their approach for representativeness, which is based on multilevel regression with post-stratification.


Erdem Yörük is an Associate Professor in the Department of Sociology at Koç University and an Associate Member in the Department of Social Policy and Intervention at University of Oxford. He serves as the principal investigator of the ERC-funded project “Emerging Welfare” (The New Politics of Welfare: Towards an “Emerging Markets” Welfare State Regime) and the H2020 project Social Comquant. He is also a member of Young Academy of Europe and an associate editor of European Review. He holds a Ph.D. from the Department of Sociology at Johns Hopkins University (2012). His work focuses on social welfare and social policy, social movements, political sociology, and computational social sciences. His work has been supported by the National Science Foundation (NSF), Ford Foundation, FP7 Marie Curie CIG, European Research Council StG, ERC PoC, H2020, and the Science Academy of Turkey. His projects have created two datasets on welfare and protest movements. His articles have appeared in World Development, Governance, Politics & Society, Journal of European Social Policy, New Left Review, Current Sociology, South Atlantic Quarterly, American Behavioral Scientist, International Journal of Communication, Social Policy and Administration, and Social Indicators Research, among others. 

Ali Hürriyetoğlu is a postdoctoral research fellow at KNAW in the Netherlands as part of the Odeuropa project working on historical multilingual text processing. Dr. Hürriyetoğlu was a postdoctoral research fellow at Koc University in the European Research Commission (ERC) projects “Emerging Welfare” (EMW) and “Social ComQuant: Excelling in Computational and Quantitative Social Sciences in Turkey” between 2017 and 2021. Dr. Hürriyetoğlu performed research on extracting actionable information from social media in the scope of his Ph.D. studies at Radboud University. He has been working in industrial, governmental, and academic settings to process news and social media text in various domains throughout his career. His recent research focus is on the robustness and the generalizability of text processing systems across contexts. Dr. Hürriyetoğlu has been proposing challenges and organizing shared tasks on socio-political event extraction since 2019 in the scope of CLEF, LREC, ACL, and EMNLP.

Electoral challenges to the representation of cultural diversity - evidence from deep neural networks

Why do voters discriminate against candidates from ethnic minorities? Drawing on social psychological theories of automatic social categorization and recognition effects, in this article, I argue that voters' familiarity with minority candidates can help reduce their implicit biases against the political representation of cultural diversity. To test this argument, I rely both on observational and experimental data. By means of an artificial recurrent neural network (RNN), I first automatically classify the ethnicity of all German candidates (N=22000) who competed in the past five federal elections (2005-2021). Linking this data to the electoral returns of candidates, I identify the effect of discrimination at the ballot box by comparing the electoral performance of minority candidates to the performance of native candidates running on the platform of the same party in neighbouring districts. The results provide evidence that electoral discrimination diminishes when voters become familiar with minority candidates through repeated candidacies. Drawing on General Adversarial Networks (GANs) to create synthetic candidate images, I then experimentally manipulate individuals' familiarity with minority candidates. In an online behavioural experiment, I expose subjects to these photorealistic, high-quality images, half of which contain subtle visual features of well-known, real-world minority politicians. The results confirm not only that the manipulated images evoke higher feelings of familiarity in voters. They also demonstrate that voters are more likely to ascribe relevant candidate traits to such minority candidates that appear familiar to them. These findings have important implications for understanding demand-side challenges to the representation of cultural diversity in modern democracies.


Julia Schulte-Cloos is a Research Fellow at the Robert Schuman Centre for Advanced Studies, European University Institute. Her research interests centre on political behaviour, political sociology, computational social science and reproducibility.

Opportunities and Challenges in Generalising Abusive Language Detection

While models built and tested on a specific dataset and for a specific task often achieve very good performance, they then fail to generalise when they are applied to new, unseen data. In this talk I will discuss the importance and challenges of achieving generalisable performance in social media research with a particular focus on abusive language detection. I will present some of our recent work in this direction, as well as discuss open challenges to further the capacity of generalisation especially in abusive language detection.


Arkaitz Zubiaga is a lecturer at Queen Mary University of London, where he leads the Social Data Science lab. His research interests revolve around computational social science and natural language processing, with a focus on linking online data with events in the real world, among others for tackling problematic issues on the Web and social media that can have a damaging effect on individuals or society at large, such as hate speech, misinformation, inequality, biases and other forms of online harm.

Automatically explaining fact checking predictions

The past decade has seen a substantial rise in the amount of mis- and disinformation online, from targeted disinformation campaigns to influence politics, to the unintentional spreading of misinformation about public health. This development has spurred research in the area of automatic fact checking, from approaches to detect check-worthy claims and determining the stance of tweets towards claims, to methods to determine the veracity of claims given evidence documents. These automatic methods are often content-based, using natural language processing methods, which in turn utilise deep neural networks to learn higher-order features from text in order to make predictions. As deep neural networks are black-box models, their inner workings cannot be easily explained. At the same time, it is desirable to explain how they arrive at certain decisions, especially if they are to be used for decision making. While this has been known for some time, the issues this raises have been exacerbated by models increasing in size, and by EU legislation requiring models to be used for decision making to provide explanations, and, very recently, by legislation requiring online platforms operating in the EU to provide transparent reporting on their services. Despite this, current solutions for explainability are still largely lacking in the area of fact checking. This talk provides a brief introduction to the area of automatic fact checking, including claim check-worthiness detection, stance detection and veracity prediction. It then presents some first solutions to generating and automatically evaluating explanations for fact checking.


Isabelle Augenstein is an Associate Professor at the University of Copenhagen, Department of Computer Science, where she heads the Copenhagen Natural Language Understanding research group as well as the Natural Language Processing section. Her main research interests are fact checking, low-resource learning, and explainability. Prior to starting a faculty position, she was a postdoctoral researcher at University College London, and before that a PhD student at the University of Sheffield. She currently holds a DFF Sapere Aude Research Leader fellowship on 'Learning to Explain Attitudes on Social Media’, and is a member of the Young Royal Danish Academy of Sciences and Letters.

The hope and hype of using text data for the study of human behaviour

Over the past decade, two trends have fundamentally impacted how we do social science research: the availability of “found data” and advancements in statistical natural language processing. Unsurprisingly, text data are increasingly being adopted to study human behaviour through a new lens. These developments bring two exciting questions: what can we learn from text data about human behaviour, and what are the blind spots? This talk shows why text data hold some of the most exciting potentials for computational social science research, but it will also make a case for a more cautious approach when using text data as a proxy measure. We will show two sides of text data. First, we will see how they can augment more traditional survey tools with the example of people’s coping and struggle in the pandemic. Second, recent findings on adversarial machine learning are discussed that merit attention for text classification models. The talk closes with a broader perspective on the role of text data for future computational social science research.


Bennett Kleinberg is an assistant professor in data science at the Department of Methodology and Statistics at Tilburg University and an honorary associate professor at the Department of Security and Crime Science at University College London. Previously, he held a position at the UCL Dawes Centre for Future Crime and obtained his PhD from the Department of Psychology at the University of Amsterdam. His research revolves around using text data to study human behaviour and seeks to bridge the gap between natural language processing (NLP) and the social and behavioural sciences. Currently, specific research areas include fundamental questions about the nexus between text data and psychological processes, the methodological boundaries of NLP, and applied NLP topics. He combines methods from experimental online research, statistical NLP, machine learning and simulation studies to study these topics. He is also an active contributor to the open science community - most recently with privacy-preserving text anonymisation software supported by grants from SAGE and the Dutch Research Council (NWO).

“Individuals acquire increasingly more of their political information from social media, and ever more of that online time is spent in interpersonal, peer-to-peer communication and conversation. Yet, many of these conversations can be either acrimoniously unpleasant or pleasantly uninformative. Why do we seek out and engage in these interactions? Who do people choose to argue with, and what brings them back to repeated exchanges? In short, why do people bother arguing online? We develop a model of argument engagement using a new dataset of Twitter conversations about President Trump. The model incorporates numerous user, tweet, and thread-level features to predict user participation in conversations with over 98% accuracy. We find that users are likely to argue over wide ideological divides, and are increasingly likely to engage with those who are different from themselves. In addition, we find that the emotional content of a tweet has important implications for user engagement, with negative and unpleasant tweets more likely to spark sustained participation. Although often negative, these extended discussions can bridge political differences and reduce information bubbles. This suggests a public appetite for engaging in prolonged political discussions that are more than just partisan potshots or trolling, and our results suggest a variety of strategies for extending and enriching these interactions.”

Speaker: Sarah Shugars, Center for Data Science, New York University

"Helen Nissenbaum's theory of privacy as contextual integrity provides a useful framework for assessing the (in)appropriateness of information flows across different contexts. Such assessments are increasingly important as we share more and more personal information, both intentionally and unintentionally, with a wide range of organizations and institutions. Contextual integrity argues that you must consider several factors, including the actors involved in data sharing (e.g., who is sending/receiving data), the nature of the content (e.g., sensitivity), the context of data sharing (e.g., healthcare, legal), and the transmission principles guiding data sharing (e.g., confidentiality;  consent). In this talk, I'll discuss two examples of how I've applied contextual integrity in my research. First, I'll share details from a recent study published in Social Media & Society assessing Facebook users' comfort with their data being used for research purposes. Second, I'll share ongoing research that evaluates workplace surveillance practices both during the pandemic and in the future. For both of these studies, we developed factorial vignettes to capture respondents' perceptions of the appropriateness--and their concerns with--various data collection practices. I will also share my thoughts on the main strengths and weaknesses of using contextual integrity to guide similar studies on data privacy.”Nissenbaum's theory of privacy as contextual integrity provides a useful framework for assessing the (in)appropriateness of information flows across different contexts. Such assessments are increasingly important as we share more and more personal information, both intentionally and unintentionally, with a wide range of organizations and institutions. Contextual integrity argues that you must consider several factors, including the actors involved in data sharing (e.g., who is sending/receiving data), the nature of the content (e.g., sensitivity), the context of data sharing (e.g., healthcare, legal), and the transmission principles guiding data sharing (e.g., confidentiality;  consent). In this talk, I'll discuss two examples of how I've applied contextual integrity in my research. First, I'll share details from a recent study published in Social Media & Society assessing Facebook users' comfort with their data being used for research purposes. Second, I'll share ongoing research that evaluates workplace surveillance practices both during the pandemic and in the future. For both of these studies, we developed factorial vignettes to capture respondents' perceptions of the appropriateness--and their concerns with--various data collection practices. I will also share my thoughts on the main strengths and weaknesses of using contextual integrity to guide similar studies on data privacy.”

Speaker: Jessica Vitak, Human-Computer Interaction Lab (HCIL), University of Maryland

There has been a great deal of concern about the negative impacts of online misinformation on democracy and society. In this talk, I provide an overview of my research on understanding why people share misinformation and how to combat spread of low-quality content online. I first focus on the why question and describe a hybrid lab-field study in which Twitter users (N=1,901) complete a cognitive survey.  I show that people who rely on intuitive gut responses over analytical thinking share lower quality content. I then build on this observation with a Twitter field experiment (N= 5,379) that uses a subtle intervention to nudge people to think about accuracy. I show the intervention significantly improve the quality of the news sources they shared subsequently. Finally, I will talk about a follow-up study where we directly correct Twitter users (N=2000) who shared misinformation by replying to their false tweets by including a link to the fact-checking website. We show that unlike the subtle accuracy nudge, the direct public correction results in users sharing lower quality content. Our experimental design translates directly into an intervention that social media companies could deploy to fight misinformation online.

Main references

[1] Mosleh M, Pennycook G, Arechar AA, Rand DG “Cognitive reflection correlates with behavior on Twitter” Nature Communications 12, 921 2021.

[2] Pennycook G*, Epstein Z*, Mosleh M*, Arechar AA, Eckles D, Rand DG, “Shifting attention to accuracy can reduce misinformation online” Nature, 2021.

[3] Mosleh M, Martel C, Eckles D, Rand DG “Perverse Consequences of Debunking in a Twitter Field Experiment” Conference on Human Factors in Computing Systems (CHI) 2021.

Speaker: Mohsen Moslev, University of Exeter

Measurement of social phenomena is everywhere, unavoidably, in sociotechnical systems. This is not (only) an academic point: Fairness-related harms emerge when there is a mismatch in the measurement process, between the thing we purport to be measuring and the thing we actually measure. However, the measurement process is almost always obscured in algorithmic systems. We show how the measurement process helps reveal how social, cultural, and political values are implicitly encoded in sociotechnical systems. The concepts of content and consequential validity help us elicit and characterize the feedback loops between the measurement, social construction, and enforcement of social categories. We then explore the constructs of fairness, robustness, and responsibility, and point to lessons from other forms of governance in and for responsible AI. Together, these perspectives help us unpack how measurement acts as a hidden governance process in sociotechnical systems. Understanding measurement as governance supports a richer understanding of the governance processes already happening in AI—responsible or otherwise—and reveals paths to more effective interventions.

Speaker: Abigail Jacobs, University of Michigan

The framing of political issues and descriptions of people can influence policy and public opinion. In this talk, I will describe two studies of framing and how we can use computational methods to uncover social trends such as how ordinary people frame political issues or how societal attitudes have changed over time. First, I will introduce a new computational study of political framing for the topic of immigration. Using multiple framing typologies grounded in political communication theory, I will show how we develop new models to recognize these frames and through analyzing millions of users' comments on immigration, demonstrate how ideology and region impact framing and how a message's framing influences both who responds and the amount of audience response. Following, I will describe other work out of my lab looking at another type of framing, dehumanization, in a longitudinal analysis of LGBTQ descriptions in the New York Times.

Speaker: David Jurgens, School of Information and Electrical Engineering & Computer Science, University of Michigan

The unprecedented increase in social media use and the large-scale collection of information poses new threats as well as bringing new opportunities. Modeling and managing complex interactive systems require mining of social and technological signals for new insights into human society and individual behavior. Online social networks have been taking an essential part in our access to information and it acts as a good proxy for studying population-level behavioral patterns and making individual-level predictions. In this talk, I will be presenting my research on analyzing various account behaviors from social bots disseminating misinformation to human venting their emotions to their friends. First, I will present Botometer, a platform for detecting social bots, that is widely adopted in academia and industry to study the dissemination of misinformation and characterization of automated behavior. Using estimations by Botometer, I will show how bot activities have effects on information spread, and I will demonstrate our results on estimating the prevalence of social bots and anomalous patterns captured among popular accounts investigated. Later I will demonstrate how we can leverage social media to study the evolution of human emotions in a minute-scale resolution at the population scale.

Speaker: Onur Varol, Faculty of Engineering and Natural Sciences and Principal Investigator at the VIRAL Lab, Sabanci University

“It is estimated that at the moment about four billion social media users can be counted worldwide. Although users of the diverse social media platforms usually do not have to pay with their own money for the allowance to use such a service, the usage-allowance comes not without costs, because users pay with their data.

The data business model yields large problems, which are at the focus of the present talk. Among others it will be discussed that social media platforms are designed to prolong online usage times (this provides the companies behind the platforms with more data), which even might result in addictive usage behavior - at least for a subgroup of users. Aside from that the data business model comes with costs in terms of potential detrimental aspects on political opinion formation and privacy. With respect to the latter the terms digital phenotyping and mobile sensing will be introduced and it will be discussed how good psychological traits can be predicted from digital footprints (focus on personality). The talk ends with an outlook on what alternative exists to fight both the current data business model and in general surveillance capitalism.”

Christian Montag received his diploma in psychology in September 2006. In 2009 he achieved his PhD degree on his psychobiological works testing Gray’s revised reinforcement sensitivity theory. In 2011 he got the venia legendi for psychology. Since September 2014 he is Professor for Molecular Psychology at Ulm University, Germany. Since 2016 he is also Visiting Professor at the NeuScan-Lab/Key Laboratory for Neuroinformation, UESTC in Chengdu, China. Christian Montag is interested in the molecular genetics of personality and emotions / affective neuroscience. He combines molecular genetics with brain imaging techniques such as structural/functional MRI to better understand individual differences in human nature. Adding to this he conducts research in the fields of Neuroeconomics and (Internet) addiction including new approaches from Psychoinformatics.

Speaker: Christian Montag, Department for Molecular Psychology, Institute for Psychology and Education, Ulm University

"In this talk, I will present a series of results from two web tracking and survey studies conducted a) to study news aggregators’ influence on news diets and political polarization and b) to investigate the spread of disinformation in Germany. The first study was conducted in 2017 during the election campaigns for the last German federal election and the second study in 2019 during the campaigns for the European Parliament elections. Web browsing behavior of about 1,600 participants was tracked over several weeks in 2017 and web browsing behavior of about 1,000 participants in 2019. In addition, participants answered to several rounds of surveys measuring their political preferences and socio-demographic information. Results from these studies suggest that social media and search engines do not restrict the diversity of content that users are exposed to. Moreover, party preferences do not seem to influence respondents’ news diets, calling fears that social networks foster selective news exposure and political polarization into question. Regarding the spread of disinformation in Germany, results indicate that disinformation is a rather limited phenomenon. Social media, however, seem to play an important role in the dissemination of such content. Besides these results, I will comment on challenges that arise when working with such data and I will talk about the question what we can learn from online behavior about users’ political preferences."

Dr. Ruben Bach is a postdoctoral researcher in statistics and social science research methodology at the University of Mannheim. Trained in economics, in the social sciences and in survey methodology, his research focuses on questions related to the use of big data and digital trace data for social research (for example, web log data and social media data) and accompanying issues regarding data privacy. He also studies how computational methods from the machine learning and natural language processing toolbox can be applied to social scientific research problems. Recently, he started investigating (unintended) consequences of the use of artificial intelligence for social inequality and methods to de-bias "unfair" algorithms.

Speaker: Ruben Bach, University of Mannheim - School of Social Sciences

Distributional word vectors have recently been shown to encode many of the human biases, most notably gender and racial biases, and models for attenuating such biases have consequently been proposed. However, existing models and studies (1) operate on under-specified and mutually differing bias definitions, (2) are tailored for a particular bias (e.g., gender bias) and (3) have been evaluated inconsistently and non-rigorously.

In this work, we introduce a general framework for debiasing word embeddings. We operationalize the definition of a bias by discerning two types of bias specification: explicit and implicit. We then propose three debiasing models that operate on explicit or implicit bias specifications, and that can be composed towards more robust debiasing. Finally, we devise a full-fledged evaluation framework in which we couple existing bias metrics with newly proposed ones.

Experimental findings across three embedding methods suggest that the proposed debiasing models are robust and widely applicable: they often completely remove the bias both implicitly and explicitly, without degradation of semantic information encoded in any of the input distributional spaces. Moreover, we successfully transfer debiasing models, by means of crosslingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.

Simone Paolo Ponzetto is Professor of Information Systems at the University of Mannheim and a member of the Data and Web Science Group, where he leads the Natural Language Processing and Information Retrieval group. His main research interests lie in the areas of knowledge acquisition, text understanding, and the application of natural language processing methods for research in the digital humanities and computational social sciences

Authors: Simone Paolo Ponzetto, University of Mannheim (joint work with Anne Lauscher, Goran Glavaš, and Ivan Vulić)

Speaker: Simone Ponzetto, University of Mannheim

The unprecedented availability of large scale datasets about scholarly output has advanced quantitatively our understanding of how science progresses. In this talk we present a series of findings from the analysis of a large-scale dataset of publications and of scientific careers. We tackle the following three questions: How does impact evolve in a career? What is the role of scientific chaperones in achieving high impact? How interdisciplinary is our award system?
We show that impact, as measured by influential publications, is distributed randomly within a scientist’s sequence of publications, and formulate a stochastic model that uncouples the effects of productivity, individual ability, and luck in scientific careers. We show the role of chaperones in achieving high scientific impact and we study the relation between interdisciplinarity and scientific recognitions. Taken together, we contribute to the understanding of the principles governing the emergence of scientific success.

Speaker: Roberta Sinatra, IT University of Copenhagen and ISI Foundation

With populism on the rise around the world, researchers have been developing different ways to measure it, in order to better understand its causes and consequences. When looking at the elite level, a natural source for analysis is communication produced by politicians and parties. It would be ideal to have an automated method to classify texts from different countries and languages as populist, in order to scale up the analyses and cover a large number of politicians and time-frame. However, up to now there is no well-performing automated method for the detection of populist discourse in texts, partially due to the lack of large training corpora -- i.e., bodies of texts that have been hand-coded for how populist they are. We address this problem by using the Global Populism Database (Hawkins et al. 2019), which is the largest comparative corpus of speeches hand-coded on a populism scale. It includes more than 1,000 speeches from 214 politicians in 60 countries. We use three methods: supervised machine learning, structural topic models scaling, and Wordscores, in order to find an accurate measure that can be applied to new texts. Our models show a moderate correlation with the hand-coded scores, and acceptable performance when populism is treated as a binary indicator, suggesting they can be used to measure populism in speeches that have not been assessed by human coders yet.

Authors: Bruno Castanho Silva (University of Cologne) and Rebecca Kittel (European University Institute)

Speaker: Bruno Castanho Silva, Cologne Center for Comparative Politics

Social science researchers agree on the relevance of tracing citizens’ information usage on the Internet and analyzing how the algorithmic selection processes influence what they receive online. Yet, many scholars still rely on classical survey research when trying to analyze online information behavior, although this research has shown to be insufficient due to social desirability concerns and the users’ limited capacities to remember their online behavior (Prior 2009; Scharkow 2016). Social scientists who already use computational tools that automatically register online information behavior, often rely on web analytics software (e.g. Leiner, Scherr & Bartsch 2016). Such software, however, requires the modification of the original web content of targeted websites or focus on one technological interface (e.g. For broader tracking efforts, some academic tools are just evolving (e.g. Newstracker see Kleppe & Otto, 2017; Roxy see Menchen-Trevino & Karr, 2012, Web Historian see Menchen-Trevino, 2016) whereas others have relied on market researchers’ tools. Most approaches, however, only allow to identify the URLs without capturing the content information (for exceptions see Dvir-Gvirsman, Tsfati, & Menchen-Trevino, 2016; Bodo et al., 2017).

This is where the project of Prof. Maier and her colleagues sets in. They are developing a tracking tool that allows to track online information behavior across different platforms and extract the content a user actually sees for the further analysis. The latter feature is very important as such screen-scraping allows Prof. Maier and her colleagues to observe the algorithmically personalized content each user is exposed to. In contrast to most commercial solutions, this tool strongly focuses on privacy issues allowing different privacy options. WebTrack is a browser extension (so far adapted for Chrome and Firefox), which runs on desktop devices. In the presentation Prof. Maier and her colleagues will demonstrate how the tool works, and which chances and challenges she and her colleagues tackle.

Speaker: Michaela Maier, University of Koblenz · Landau; Silke Adam, Mykola Makhortykh, University of Bern.

The average European eats too little fruits and vegetables. The current individual-centered interventions on healthier eating are little effective - potentially, because they do not consider that eating is actually a social activity. Eating together is more than just putting food in one’s mouth. It is the glue of social relationships. Feeling as part of a group (eating chips together) is often more important than eating healthy food (alone). Food is also one of the central topics in social media. Up to 85% of pictures that adolescents share on Instagram contain food, two thirds of them junk food. Why do people post food pictures on social media? How do social media shape dietary norms and eating behaviors? In this talk, Jutta Mata will present observational longitudinal studies, experiments, and tracking of mobile devices to investigate these questions. The speaker will also talk about new ideas on how to join psychological and computational sciences to better understand eating-related activities in social media.

Jutta Mata is professor of health psychology at the University of Mannheim, and associated research scientist at the Max Planck Institute for Human Development, Berlin, the Mannheim Centre for European Social Research, and the University of Lisbon, Portugal. Her main research topics include understanding individual and environmental factors that determine weight-related health behaviors and the effects of health behaviors on well-being. She is also interested in the use and effects of mHealth and online social networks for healthy behavior. Jutta studied psychology in Göttingen, Lisbon, and Berlin. She was a member of the International Max Planck Research School LIFE and received her PhD in 2008. After being a post-doctoral research fellow at both, the University of Lisbon, Portugal, and Stanford University, USA, she worked as a research scientist at the University of Basel, Switzerland, and the Max Planck Institute for Human Development. From 2014-2015 she was an assistant professor of health psychology at the University of Basel.

Speaker: Jutta Mata, University of Mannheim

The Propaganda Model (PM) discussed in Manufacturing Consent is a theory in political economy that states that the mass media are channels through which governments and major power groups pass down certain ideologies and mold a general consent according to their own interests. According to the authors, every piece of news has gone through a set of filters that ultimately yield the source event as newsworthy. Current developments in communications, the digital availability of large-scale news online streaming from every corner of the world, together with our increasing capability to process all this information in a lot of different ways, give us the perfect environment to test social theories using quantitative methods. In our work we take advantage of all these data to test, empirically, the theory laid out in the PM. Previous works have used machine learning and natural language processing techniques, but focused only on showing bias to a political party by a sample of the major news outlets. Here we make a first attempt in the formalization of the model and the filters, and we help to provide an explanation of how the media works taking a computational approach. Results illustrate a measurable media bias, showing marked favoritism of Chilean media for the ruling political parties. This favoritism becomes clearer as we empirically observe a shift in the position of the mass media when there is a change in government. Furthermore, results support the PM by showing that high levels of concentration characterize the Chilean media landscape in terms of ownership and topical coverage. Our methods reveal which groups of outlets and ownership exert the greatest influence on news coverage and can be generalized to any nation’s news system. Our studies on the geographic news coverage also give indications of the presence of the second filter (advertising). Experiments on predicting the communities with the biggest share of readership show this to be highly correlated with those regions with the greatest population, better socioeconomic status, and a distinct political preference. As far as we know, this is the first time that there has been an attempt to empirically test this political economy theory using data science.

Speaker: Erick Elejalde, L3S Research Center

We are more connected than ever. In part, this is due to the high availability (in both developed and developing countries) of relatively cheap smartphones that we carry with us all the time. In fact, in 2018, more than 52% of the whole website traffic was generated by mobile phones. Unlike desktops or laptops, one defining characteristic of mobile phones is that they are always geo-located, either by GPS, antenna triangulation, or simply antenna connections. Thus, mobile phone data sets, either Call Detail Records (CDRs) or the “data channel” (called XDRs) constitute a potential treasure trove of information about what people do in the physical world, not only when interacting with it, but also when accessing information online.

Through our association to Telefónica Chile R&D department, we have been privileged in our access to mobile phone data sets to study socially relevant issues. In this talk, I will present results from several studies done using ecologically-valid data sets of mobile phone usage (CDRs, XDRs, and a bit of an even lower level of analysis, deep packet inspection) and the towers they connect to, drawing conclusions and predicting different kinds of behaviors, from social mixing, to news consumption, to gender equality. All these studies were conducted by analyzing web traffic either by proxy to applications like Pokemon Go, or effectively through DNS resolution as in the news study. I will conclude by talking about the coming trends in the field of mobile phone data analysis, its limitations, and spend some time discussing issues of privacy, data security, anonymization and general data responsibility for researchers and the company providing the data.

Speaker: Leo Ferres, Universidad del Desarrollo, Chile

In this talk, Philipp Schaer and Malte Bonart present a web scraping framework for monitoring the auto-completions of a large set of queries and various search providers. Query auto-completions in web search engines support their users in formulating the search query and can point to previously unaware query candidates. When searching for well-known persons, such as politicians or celebrities, trending person-related suggestions appear in the ranking over a short period.  For the last years, the speakers have gathered daily search suggestions of the names of German politicians and political terms. They describe the dataset and its essential characteristics. Then, the talk refers to preliminary work on measuring topical biases regarding person related attributes, such as the gender or the party affiliation. Finally, it will be discussed about further research ideas regarding the credibility of person-related auto-completions and future cooperations with GESIS.

Philipp Schaer is Professor for Information Retrieval at TH Köln (University of Applied Sciences). He is a former team leader and postdoctoral researcher at the GESIS department Computational Social Science where he led a team of computer, social and information scientists. He published on information retrieval-related topics like query expansion, applied informetric methods in digital libraries, and evaluation of information systems.

Malte Bonart is a doctoral researcher working at TH Köln (University of Applied Sciences). His research focuses on the composition and evolution of query suggestions in web search related to politicians and political topics. He is part of the research training group on “digital society” funded by the federal state. Previously, he was a research assistant at the GESIS department Computational Social Science where he collected, analyzed and visualized large amounts of textual, social media data.

Speakers: Philipp Schaer, Malte Bonart, TH Köln

Social networks are complex and dynamic systems. Individual nodes in networks, however, do not necessarily overlook the network as a whole, but are mostly affected by their smaller (micro-level) neighborhoods. At the same time, emerging large-scale (macro-level) network outcomes such as segregation, cluster formation, or the distribution of knowledge have a direct impact on them and can restrict their opportunities to act. In the study of social network dynamics it is thus important to simultaneously consider two levels: the macro-level of large-scale network structures and the micro-level of individuals’ preferences, opportunities and actions. This talk illustrates how state-of-the-art statistical network methods and computational techniques can be combined to investigate the micro-macro link in social networks. Recent empirical work in the context of the Swiss StudentLife study will illustrate the value of this approach.

Christoph Stadtfeld is an assistant professor of Social Networks at ETH Zürich. His research focuses on the development and application of theories and statistical methods for social network dynamics. He holds a PhD from Karlsruhe Institute of Technology and has been postdoctoral researcher and Marie-Curie fellow at the University of Groningen, the Social Network Analysis Research Center in Lugano, and the MIT Media Lab. His work is published in leading sociological and interdisciplinary journals including Social Networks, Social Forces, Sociological Science, Sociological Methodology, and PNAS.

Speaker: Christoph Stadtfeld, ETH Zürich

Modern technology has drastically changed the way we interact and consume information. For example, online social platforms allow for seamless communication exchanges at an unprecedented scale. However, we are still bounded by cognitive and temporal constraints. Our attention is limited and extremely valuable. Algorithmic personalization has become a standard approach to tackle the information overload problem. As result, the exposure to our friends’ opinions and our perception about important issues might be distorted. 

However, the effects of algorithmic gatekeeping on our hyper-connected society are poorly understood. During the talk, the speaker will discuss a model of opinion dynamics where individuals are connected through a social network and adopt opinions as function of the view points they are exposed to. Nicola Perra will consider various filtering algorithms that select the opinions shown to each user i) at random ii) considering time ordering or iii) her current opinion. Furthermore, he will analyze the interplay between such mechanisms and crucial features of real networks.

Nicola Perra is Senior Lecturer in Network Science in the Business School of University of Greenwich. He joined the University in August 2015. From 2014 to 2015 he was Associate Research Scientist at Northeastern University, Boston USA. From 2011 to 2014 he served as Post-Doctoral Research Scientist at Northeastern University, Boston USA. From 2009 to 2011 he was Research Associate at the Center for Complex Networks Systems Research of the Indiana University in Bloomington, USA.

He holds a PhD in Physics from the University of Cagliari, Italy. He has teaching experience in the areas of Physics, Mathematics, and Network Science. His research has been published in wide range of peer-reviewed journals, conferences, and books chapters. He is the editor of the book Social Phenomena: From Data Analysis To Models (Springer, 2015), and the organizer of several workshops on human dynamics, social modelling and temporal networks hosted in major international conferences in Physics, Network Science, and Computer Science. He has been elected in the council of the Complex System Society. He has been elected in the steering committee of the annual conference of the Complex System Society. He is member of the Network Society. 

Speaker: Nicola Perra, University of Greenwich

What drives us to trust someone we have just met? Did we eat spaghetti for lunch because my colleague eats spaghetti? Do we become happier when we are nicer to our neighbors? Does our decision depend on the food we ate before? Research from different disciplines has attempted to investigate the motives of and modulators of human decision-making. Humans tend to believe that decisions are made based on rational thoughts, so it is surprising how irrational motives and modulators are that guide our decisions. Specifically, our decisions heavily depend on the interaction with our environment. These can be social networks, our own behavior towards the social environment but also simply the food we have eaten before. Here, I will present a series of recent studies from my lab, in which we shed light on the psychological, neural and metabolic motives and modulators of human decision-making.

Soyoung Q Park is a joint professor of Decision Neuroscience and Nutrition at the Charité – Universitätsmedizin Berlin and German Institute for Human Nutrition (DIfE). Her research focuses on the social, neural and metabolic mechanisms underlying human decision-making, with the ultimate goal to develop novel intervention strategies to optimize human decision-making. Here, the reward-based decisions, such as consumer decisions and decision-making in a social context, are the focus. She studied Psychology at the Institute of Technology Berlin and received her PhD in Neuroscience from the Freie Universität zu Berlin during her stay as a stipend holder at the Berlin School of Mind and Brain at the Humboldt University. After her PhD, she moved to Switzerland and worked as a postdoctoral researcher with Ernst Fehr and Philippe Tobler in the Department of Economics at the University of Zurich. During the last five years, Soyoung Park has been working as a professor of Social Psychology and Decision Neuroscience at the University of Lübeck, where she was the head of the Psychology degree program (BA and MA). Her work is published in leading neuroscientific and interdisciplinary journals including Nature Communications, PNAS and Scientific Reports.

Speaker: So Young Park, University of Lübeck

How do intellectual fields change? Raphael Heiberger and his colleagues contend that fields are composed of incumbents who continually appeal to shifting research preferences of new recruits so as to renew and sustain salient positions and engrossment. In so doing, the field changes the “sermons” of its “religion” but reproduces the same kinds of “priests”. Mr. Heiberger and his colleagues develop this argument by studying the field of sociology. To trace semantic position-takings and field reproduction empirically, they concentrate on two crucial points in each scholar’s career: her entry by earning a doctorate and becoming an advisor herself. In particular, they gather comprehensive data of over 80,000 sociology dissertations at U.S. universities and those graduates’ pursuant academic careers. Utilizing structural topic models, Mr. Heiberger and his colleagues infer semantic positions students take in their theses and trace sociological research trends. Their findings reveal that positions related to the cultural turn and qualitative methods are rising in student popularity, while survey-related methods are declining. Further, logistic regressions show that position-takings are also significantly influencing the likelihood that graduates become advisors in sociology – net of structural conditions (e.g. gender, race, performance). At elite institutions of the field, both structural and semantic effects are most pronounced. Academic power caters to certain semantic positions while selecting persons with essentially the same forms of endogenously valued capital that fit traditional elite characteristics (ethnicity, performance).

Raphael H. Heiberger graduated at the Otto-Friedrich University of Bamberg in sociology on financial market price formation processes. He was a visiting scholar at UC Berkeley with the Fulbright program and is a regular member of Dan McFarland's lab at Stanford University. Currently, he is leading a research project at the University of Bremen. Funded by the German Federal ministry of Education and Research, the group investigates contextual influence factors on scientific careers by applying computational methods on large text corpora and relational data.

Speaker: Raphael Heiberger, SOCIUM Forschungszentrum Ungleichheit und Sozialpolitik

With the advent of large-scale data and the concurrent development of robust scientific tools to analyze them, important discoveries are being made in a wider range of scientific disciplines than ever before. A field of research that has gained substantial attention recently is the analytical, large-scale study of human behavior, where many analytical and statistical techniques are applied to various behavioral data from online social media, markets, and mobile communication, enabling meaningful strides in understanding the complex patterns of humans and their social actions.

The importance of such research originates from the social nature of humans, an essential human nature that clearly needs to be understood to ultimately understand ourselves. Another essential human nature is that they are creative beings, continually expressing inspirations or emotions in various physical forms such as a picture, sound, or writing. As we are successfully probing the social behaviours humans through science and novel data, it is natural and potentially enlightening to pursue an understanding of the creative nature of humans in an analogous way. Further, what makes such research even more potentially beneficial is that human creativity has always been in an interplay of mutual influence with the scientific and technological advances, being supplied with new tools and media for creation, and in return providing valuable scientific insights. In this talk, the speaker will present recent works on the mathematical analysis of color contrast in painting, and construction of content-based influence networks in culture.

Juyong Park is an Associate Professor of Culture Technology at KAIST - Korea Advanced Institute of Science & Technology. The speaker holds a Ph.D. in Physics and Complex Systems from the University of Michigan and was a Research Fellow at Northeastern University and Harvard Medical School. His research interests include culture and cultural phenomena from a complex-systems perspective.

Speaker: Juyong Park, KAIST

Many social, technological or biological systems are formed by a complex pattern of connections between their constituents; for example, billions of users interact through online networks such as Facebook and Instagram, billions of electronic machines interact via physical connections in the Internet, and thousands of billions of synaptic connections comprise the neural network of our brain. The network structure of such systems is known to have a huge impact on how they operate. Here, Ali Faqeeh reviews some of his recent findings about the effect of structure on networked behavior and discuss potential applications in social sciences. The speaker presents results regarding different aspects of structural properties, including, multilayer networks, network hidden geometry, community structures, and noisy (imperfect) data on structure and/or dynamics. Ali Faqeeh discusses how each of these properties play a crucial role in various applications such as determining robustness of networks, optimal vaccination strategies, efficient navigation, and identification of the most influential spreaders.

Ali Faqeeh is a postdoctoral fellow at the Mathematics Application Consortium for Science and Industry (MACSI), University of Limerick, Ireland, and a non-resident research fellow of the Center for Complex Networks and Systems Research (CNetS), Indiana University, Bloomington, IN, USA. He is part of the project “Mathematical modeling of social spreading phenomena” and is currently working on the dissemination of scientific research through academic publications, the spread of online information in Twitter, and the identification of the most influential spreaders in online platforms. From 2016 to 2018 Ali Faqeeh was a postdoctoral researcher at CNetS and at School of Informatics, Computing, and Engineering, Indiana University, USA. He holds a Ph.D. in Applied Mathematics from the University of Limerick, Ireland (2016), an M.Sc. in Condensed Matter Physics (2012) and a B.Sc. in Physics (2009) from Isfahan University of Technology, Iran. His research interests include computational social science, complex networks and systems, and modeling of stochastic processes.

Speaker: Ali Faqeeh, University of Limerick, Ireland and Indiana University, Bloomigton

Search engines are seen by their users as trustworthy and neutral intermediaries between users and the content of the web. This is not true, however, as can be seen from the self-interest of search engine operators, which has led, among other things, to an antitrust case by the European Commission against Google. On the other hand, content providers and the search engine optimizers they commission have considerable opportunities to influence the search results of Google and other search engines in their favour.

This raises the question of what results or what kind of results users get to see in the top positions of search engines. Dirk Lewandowski and his colleagues seek answers to this question by automatically evaluating the top results for a large number of search queries on the same topic. They extract the search queries from search engine log files so that they can realistically map the query behavior of users. The analysis of the search results takes place both on the level of the domain and on the level of the providers behind them (by automatically collecting the imprint data of the websites).

In addition to software development, Dirk Lewandowski and his colleagues analysed search queries on the subject of insurance comparisons as a first use case. Among other things, it became apparent that Google's top search results, from which the majority of the hits are selected by the users, are provided by only a few companies and that these companies can thus exert a strong influence on the perception of a topic. Other topics that they will work on include gender stereotypes in the search results, controversial topics such as nuclear power or economic topics such as financing.

Dirk Lewandowski is a professor of information research and information retrieval at the Hamburg University of Applied Sciences, Germany. He is the editor of Aslib Journal of Information Management (formerly: Aslib Proceedings), a ISI-ranked information science journal. Dirk Lewandowski studied library science at the School of Library Science in Stuttgart, as well as philosophy, information science, and media studies at Heinrich Heine University in Düsseldorf. He received his Ph.D. from that university in 2005.

Dirk has published extensively in the areas of Web information retrieval, search engine user behaviour and the role that search engines play in society. His work has been published in some of the leading information science journals, including JASIST, Journal of Information Science and Journal of Documentation. Dirk has served as an expert to, among others, the High Court of Justice (UK) and the Deutscher Bundestag (German Parliament). He has been named an ACM Distinguished Speaker in 2016.

Prof. Lewandowski authored and edited several books on search engines, including “Suchmaschinen verstehen” (Springer, 2015) and “Web Search Engine Research” (Emerald Group Publishing, 2012), as well as a series of German-language handbooks on search.

Speaker: Dirk Lewandowski, Hamburg University of Applied Sciences

The last decades of psychological research have been highly dominated by the use of self-report questionnaires as the primary method of data collection. This method has been useful to study inner process such as feeling, thoughts and emotions as well as behavior. Questionnaire data is also known to be subject to a series of biases such as response styles as e.g. social desirability, ecological invalidity and memory. However, due to the lack of alternatives, the field of Psychology has embraced questionnaire data ever since. Currently, the digitalization of our society rapidly progresses from the use of smartphones and wearables to the habitation of fully digital smart homes and environments. Whereas some years ago, phones merely represented simple communication devices, the application-ecosystems of modern phones are able to satisfy a wide range of daily human needs such as surfing the web, banking, listening to music, and dating, to name a few. Furthermore, smartphones are equipped with a large number of sensors and computational capabilities. As a natural byproduct of user interactions, smartphones produce large amounts of data about where, when, and how people do what with their phones. With regard to the aforementioned lack of “real data” in Psychology, usage of these data could allow for the systematic investigation of individual differences both, across individuals (traits: big-five personality, demographics) and as processes within single individuals, over time (states: emotions, mood). In this talk Clemens Stachl will present the current state of the PhoneStudy mobile sensing project at the Ludwig-Maximilians-Universität München and first insights from the collected data. The aim of the PhoneStudy project is the development of a tool for both, the collection and the analyses of actual behavioral and situational data in Psychology.

Clemens Stachl is a PostDoc researcher at chair for psychological methods and assessment at Ludwig-Maximilians-Universität München. In his research, he focuses on the collection and of behavioral data with means of consumer electronics (e.g. smartphones, cars etc.). Currently, he is investigating the possibility to predict psychological traits (e.g. personality) from digital traces of behavior, collected with the PhoneStudy smartphone app.

Speaker: Clemens Stachl, Ludwig-Maximilians-Universität München

Twitter research to date has focused mainly on the study of isolated events, as described for example by specific hashtags or keywords relating to elections, natural disasters, public events, and other moments of heightened activity in the network. This limited focus is determined in part by the limitations placed on large-scale access to Twitter data by Twitter, Inc. itself. This research presents the first ever comprehensive study of a national Twittersphere as an entity in its own right. It examines the structure of the follower network amongst some 4 million Australian Twitter accounts and the dynamics of their day-to-day activities, and explores the Australian Twittersphere's engagement with specific recent events.

Dr. Axel Bruns is a Professor in the Digital Media Research Centre at Queensland University of Technology in Brisbane, Australia, and was a Chief Investigator in the ARC Centre of Excellence for Creative Industries and Innovation (CCi). He is the President of the Association of Internet Researchers. Bruns is the author of Blogs, Wikipedia, Second Life and Beyond: From Production to Produsage (2008) and Gatewatching: Collaborative Online News Production (2005), and a co-editor of of Twitter and Society (2014), A Companion to New Media Dynamics (2012) and Uses of Blogs (2006). Bruns is an expert on the impact of user-led content creation, or produsage, and his current work focusses on the study of user participation in social media spaces such as Twitter, especially in the context of acute events.

Bruns’s main research interests are in social media, ‘big data’ research methods, produsage, citizen journalism, and online communities.

Speaker: Axel Bruns, Queensland University of Technology

In this talk Bruno Ribeiro generalizes traditional node/link prediction tasks in temporal attributed networks, to consider joint predictions over larger $k$-node induced subgraphs. Ribeiro shows why traditional network models fail at this task and introduce a potential solution to the problem. His key insight is incorporating the unavoidable data dependencies in training into both the input features and the model architecture itself via high-order dependencies and subgraph embeddings. The strength of the representation is its invariance to isomorphisms and varying local neighborhood sizes, while still being able to take node/edge labels into account in an inductive model which can be applied to unseen data. Learning also requires new sampling methods, where he will introduce the concept of Markov Chain Las Vegas for optimization as a more principled and flexible alternative to Contrastive Divergence.

Bruno Ribeiro is an Assistant Professor at the Department of Computer Science at Purdue University. He obtained his Ph.D. at University of Massachusetts Amherst and did his postdoctoral studies at Carnegie Mellon University. His research interests are in machine learning, with a focus on sampling and modeling relational and temporal data.

Speaker: Bruno Ribeiro, Purdue University

In 1998, Sir Tim Berners-Lee famously pleaded that “Cool URIs don’t change”: Content on the web should always remain accessible through one and exactly one address. Almost twenty years later, nothing could be further from the truth. Instead, we are rapidly moving to a world where everything online is different for everybody, all the time. Personalized sites are tailored to user preferences, posts and comments are edited, hidden or deleted as time goes by, and every bit of information has an abundance of copies, variants and remixes.

The resulting challenges to empirical methods are huge: It has become virtually impossible to gather representative samples of online data and, even worse, even those samples often don’t reflect what users see! Taking stock of current approaches to digital methods, I will argue that many widely-used practices will soon be obsoleted by new technologies, changing user behavior and declining access to APIs. However, by embracing the disjunct nature of the new web, we can expand our set of methods to secure scientific access, bolster precision and develop new theoretical avenues.

Pascal Jürgens is Research Associate at the Department of Communication at the Johannes Gutenberg-University of Mainz. His research focuses on the diffusion of information; fragmentation of and through information behavior; political communication, participation and protest culture online (e.g., petitions, protest movements, ad hoc incidents); computational quantitative methods (computer-based content analysis, time series, etc.); and social networks analysis (Twitter, Facebook).

Speaker: Pascal Jürgens, Johannes Gutenberg-University of Mainz

Fairness in machine learning is an important and popular topic these days. Most papers in this area frame the problem as estimating a risk score.  For example, Jack’s risk of defaulting on a loan is 8, while Jill's is 2. These algorithms are supposed to produce decisions that are probabilistically independent of sensitive features (such as gender and race) or their proxies (such as zip codes). Some examples here include precision parity, true positive parity, and false positive parity between groups in the population. In a recent paper, Kleinberg, Mullainathan, and Raghavan (arXiv:1609.05807v2, 2016) presented an impossibility result on simultaneously satisfying three desirable fairness properties when estimating risk scores with differing base rates in the population. Tina Eliassi-Rad takes a boarder notion of fairness and asks the following two questions: Is there such a thing as just machine learning? If so, is just machine learning possible in our unjust world? The speaker will describe a different way of framing the problem and will present some preliminary results.

Tina Eliassi-Rad is an Associate Professor of Computer Science at Northeastern University in Boston, MA. She is also on the faculty of Northeastern's Network Science Institute. Prior to joining Northeastern, Tina was an Associate Professor of Computer Science at Rutgers University; and before that she was a Member of Technical Staff and Principal Investigator at Lawrence Livermore National Laboratory. Tina earned her Ph.D. in Computer Sciences (with a minor in Mathematical Statistics) at the University of Wisconsin-Madison. Her research is rooted in data mining and machine learning; and spans theory, algorithms, and applications of massive data from networked representations of physical and social phenomena. Tina's work has been applied to personalized search on the World-Wide Web, statistical indices of large-scale scientific simulation data, fraud detection, mobile ad targeting, and cyber situational awareness. Her algorithms have been incorporated into systems used by the government and industry (e.g., IBM System G Graph Analytics) as well as open-source software (e.g., Stanford Network Analysis Project). In 2010, she received an Outstanding Mentor Award from the Office of Science at the US Department of Energy.

Speaker: Tina Eliassi-Rad, Northeastern University

The availability of real-world data provides partial and indirect observations of real-world phenomena, allowing studying and understanding these phenomena.  Unsupervised learning methods, such as clustering and latent feature models, are suitable techniques to ease the data analysis process allowing us to both analyze and make predictions on the data.  However, most of the existing techniques assume the data to be homogeneous and i.i.d. This assumption might be too limiting in many real-world application domains, such as computational social science, where we often aim to analyze user data which contain not only social-demographic information of the users (i.e., heterogeneous data) but also users’ activity data (i.e, time-dependent data).

In this talk, Isabel Valera will focus on providing the key ideas behind her approach to perform unsupervised learning in both heterogeneous datasets, containing mixed continuous and discrete observations; and with continuous-time data, abundant in an increasingly networked digital world. The speaker will then use the proposed approached to analyze and perform predictions in data collected from different application domains, including social networks.

Isabel Valera is a Minerva research group leader at the Max Planck Institute for Intelligent Systems. Isabel develops flexible and efficient probabilistic models and inference algorithms to fit and analyze real-world data. She is particularly interested in problems related to the unstructured and complex nature of real-world data, which are often time-dependent, heterogeneous, noisy, and might contain errors and missing values. Isabel obtained her PhD in 2014 and her MSc degree in 2012, both from the University Carlos III in Madrid, Spain. She has been a German Humboldt Post-Doctoral Fellowship Holder, and recently she has been granted with a Minerva fast track research group from the Max Planck Society.

Speaker: Isabel Valera, Max Planck Institute for Intelligent Systems

Over the last ten years, researchers have found themselves confronting a massive increase in available data sources. In the debates on how to use these new data, the research potential of “digital trace data” has featured prominently. While various commentators expect digital trace data to create a “measurement revolution”, empirical work has fallen somewhat short of these grand expectations. In this talk, Andreas Jungherr will attempt to trace the reasons for this. For one, the traditional fields in the social sciences (perhaps with exception of communication science) have shown a disappointing disinterest in phenomena connected with the impact of the digital revolution on social life. This has led social scientists to disregard actively developing new concepts or adapting existing ones to account for the potential influence of digital technology on various aspects of social life. Second, the growing availability of digital trace data has led computer scientists to address questions traditionally in the purview of social science in their work. Unfortunately, this growing interest in social phenomena as a research object has not come with a critical reflection on the specifics of this research object and engagement with available concepts and the current state of the respective topical research fields. Accordingly, empirical findings from this work are predominantly ill connected with central debates in the social sciences and, therefore, also have failed to make an impact there. Finally, the nature of digital trace data as a data source for inferences on social phenomena has not been appropriately reflected. Instead of naively treating them as a true mirror of social phenomena these data have to be critically interrogated according to their respective data generating processes in order to identify which elements of social life they can inform on. Only if these challenges are met by the field will we start to realize the promise of digital trace data in the social sciences.

Andreas Jungherr is Assistant Professor for Social Science Data Collection and Analysis at the University of Konstanz. His research focuses on the impact of digital technology on political communication and the use of digital trace data in the social sciences. His research has been published in the Review of International Political Economy, Journal of Communication, Journal of Computer-Mediated Communication, and The International Journal of Press/Politics.

Speaker: Andreas Jungherr, University of Konstanz

Human cooperation, although not a brand new topic in science, keep tracking attention and still have many questions unanswered. In this talk, María Pereda summarizes the main works of her research career, from models to experiments about human cooperation. In addition, the speaker presents the idea of a new experiment planned to be carried out at RWTH Aachen University on perception biases, inspired by a previous work from researchers at GESIS and RWTH Aachen.

María Pereda is a postdoctoral researcher at RWTH Aachen University, working with Markus Strohmaier at the Computational Social Sciences and Humanities group. Before, she was a postdoctoral researcher at the University Carlos III de Madrid (Spain) in the multidisciplinary group for complex systems, GISC, working with Anxo Sanchez in the IBSEN project, which aimed to build a repertoire of human behavior in large (+1000 people) structured groups using controlled experiments. She did her first postdoctoral research period at University of Burgos (Spain), studying the emergence and resilience of cooperation in ancient societies using complex systems methodologies. She got a Bachelor’s degree in Industrial Engineering, specialized in Electronics in 2006, and Degree in Industrial Organisation Engineering (with distinction) in 2008, both at the University of Burgos. She got a Master’s Degree in Research in Process Systems Engineering in 2010 and a Ph.D. in Process Systems Engineering at the University of Valladolid in March 2014 (with distinction). Her Ph.D. research work pursued to apply different artificial intelligence techniques to an automatic control problem: the control of a wastewater treatment plant. Her major research interest is the study of complex systems and the discovery of patterns and unpredictable behaviors. The main methods of her research so far have been Modelling, Machine Learning, Game theory and Network theory.

Speaker: María Pereda, RWTH Aachen University

The analysis of political violence and contention using event data has become the state-of-the-art in the discipline. Most of these event datasets are based on media reports, which are known to have different biases. This talk discusses two of them: the selection problem, which refers to the fact that media sources have uneven coverage across the world, and the accuracy problem, which means that the media may systematically misreport certain types of information. The talk presents analyses assessing the severity of these biases in conflict event data, and discusses implications for event data coding and analysis.

Nils B. Weidmann is Professor of Political Science and head of the "Communication, Networks and Contention" Research Group at the Department of Politics and Public Administration, University of Konstanz. Previously, he held research fellowships at the Centre for the Study of Civil War, Peace Research Institute Oslo (2011-12), the Jackson Institute, Yale University (2010-11), and the Woodrow Wilson School, Princeton University (2009-10). Nils received a M.Sc. in Computer Science from the University of Freiburg (Germany) in 2003, a M.A. in Comparative and International Studies from ETH Zurich (Switzerland) in 2008 and a Ph.D. in Political Science from ETH Zurich. His research deals with violent and non-violent contestation, with a particular focus on the impact of communication and information technology.

Speaker: Nils B. Weidmann, University of Konstanz

Digital media have changed political communication in modern societies. One of these changes regards the fact that laypersons increasingly exchange their political opinions in social networks and online communities. A specific concern about this new form of political communication is that it promotes attitude polarization and, thus, the fragmentation of societies due to a phenomenon that has lately been described as political homophily. The hypothesis: In Social Media, people are more likely to select and exchange political content and communication that is consistent with their personal political attitudes. Due to this homogenous information environment, pre-existing political attitudes are more likely to be affirmed and reinforced compared to the offline world.

In Germany, political communication on Facebook has drawn public attention during the so-called “Refugee Crisis” in 2015 and 2016. Mr. Rothmund and his colleagues conducted two empirical studies in order to investigate whether and how political homophily could be observed in Facebook communication in this specific context. First, Tobias Rothmund and his colleagues did an online survey (N = 894, April 2016) to investigate whether Facebook users were more likely to report (a) selective exposure to information on the refugee situation in Germany and (b) a stronger false consensus effect in regard to their political attitude on this topic. Their analyses revealed significant three-way interactions (Facebook Use x Attitude Valence x Attitude Strength) on both variables. Selective exposure and the false consensus effect were correlated with attitude strength especially among Facebook users with negative attitudes towards refugees. Second, Mr. Rothmund and his colleagues investigated political communication in Facebook groups (N = 51.177 participants) that where either concerned with supporting refugees (e.g., Refugees.Welcome.Regensburg) or with criticizing the German government for the way they handled the crisis (e.g., Rücktritt Merkel & co.). News feeds were content-analyzed between June 2015 and May 2016. Tobias Rothmund and his colleagues found evidence for differences in content and structure of political communication in Facebook groups.

The methodology and the results of the present studies are discussed in the light of the theoretical framework of political homophily and new challenges and trends in political communication and its academic investigation.

Tobias Rothmund is a junior professor of political psychology at the Institute for Communication Psychology and Media Education at the University Koblenz-Landau. His research focuses on the psychological function of trust for cooperation in social groups and society; stability and change on political attitudes and ideologies; psychological reactions to norm violations and experiences on injustice in political decision making; reception and effects of violence in mass media; and motivated reception of sciences and research.

Speaker: Tobias Rothmund, University of Koblenz-Landau

Though some warnings about online “echo chambers” have been hyperbolic, tendencies toward selective exposure to politically congenial content are likely to extend to misinformation and to be exacerbated by social media platforms. Andrew Guess and his colleagues test this prediction using data on the factually dubious articles known as “fake news.” Using unique data combining survey responses with individual-level web traffic histories, they estimate that approximately 1 in 4 Americans visited a fake news website from October 7-November 14, 2016. Trump supporters visited the most fake news websites, which were overwhelmingly pro-Trump. However, fake news consumption was heavily concentrated among a small group — almost 6 in 10 visits to fake news websites came from the 10% of people with the most conservative online information diets. Mr. Guess and his colleagues also find that Facebook was a key vector of exposure to fake news and that fact-checks of fake news almost never reached its consumers.

Andrew Guess is an assistant professor of politics and public affairs at Princeton University. His research sits at the intersection of political communication, public opinion, and political behavior. He uses a combination of experimental methods, large datasets, machine learning, and innovative measurement to study how people choose, process, spread, and respond to information about politics. Current or recent projects investigate online selective exposure, the dynamics of interest group mobilization over Twitter, and the persuasive effect of new information on individuals’ attitudes and beliefs.

Speaker: Andrew Guess, Princeton University

Social media technology is young, but has already played a part in numerous turbulent evets across the world – from protests to highly polarized elections. The use of social media for misinformation, trolling and harassment has often led to them being described as a threat to democracy. Yet, not long ago, social media was seen as the spearhead of democratizing forces and activists trying to make their voices heard in autocracies and, allegedly, the cause for mass protest mobilisation in both democracies and autocracies. Moving beyond this simple binary idea about social media’s impact on democracy, this presentation focuses on two different cases in which social media operate as a challenge and as an opportunity for democracy. Demonstrating the corrosive effects of social media incivility on online discussions, as well as their empowering impact on informal networks of citizens seeking ways to provide solidarity in conditions of institutional collapse, the presentation highlights the importance of context in understanding the broader implications of social media for democracy.

Yannis Theocharis is Assistant Professor at the Department of Media Studies and Journalism of the University of Groningen. He has previously positions as Alexander von Humboldt postdoctoral fellow and research fellow at the Mannheim Centre for European Social Research, where he was co-director of the "Social Media Networks and the Relationships between Citizens and Politics” project. His research interests are in political communication, political behaviour, and social networks. His work on these topics has appeared in political science, communication and interdisciplinary journals such as Journal of Communication, Journal of Computer-Mediated Communication, Social Science Computer Review, Electoral Studies, European Political Science Review and Journal of Democracy. The focus on his current research is on social media and incivility, and his book (co-authored with Jan W. van Deth) "Political Participation in a Changing World: Conceptual and Empirical Challenges" was published in 2018 by Routledge.

Speaker: Yannis Theocharis, University of Groningen

Recommender systems have become increasingly pervasive in our daily lives to support us in identifying relevant content in an overloaded information space. Much of the research in the Recommender Systems community has focused on building (mostly data-driven) recommendation models, which make strong and sometimes too simplified assumptions about human behavior and preferences. In this talk, Elisabeth Lex will show how psychological insights can be used to develop new recommender algorithms, which better reflect and predict user behaviour. First, a hashtag recommendation algorithm is introduced that mimics how people access information in their long-term memory. Ms. Lex and her colleagues found that temporal effects play a strong role in hashtag usage. Second, a computational model of human category learning is used to improve Collaborative Filtering by incorporating non-linear user-resource dynamics. Finally, it will be discussed the recent work of Elisabeth Lex and her colleagues on echo chambers and algorithmic fairness in recommender systems.

Elisabeth Lex is assistant professor at Graz University of Technology. She heads the Social Computing research area at Know-Center, Austria's Research Center for Data-driven Business and Big Data Analytics. Her research interests include Recommender Systems, Social Network Analysis, Data Science, and Open Science. Elisabeth has been work package leader in the FP7 IP Learning Layers project, scientific coordinator of the Marie Curie IRSES Web Information Quality Evaluation Initiative (WIQ-EI) project, and task leader in the H2020 Analytics for Everyday Learning (AFEL) project. Recently, Elisabeth has been member of the Expert Group on Altmetrics, which advised the European Commission, DG Research and Innovation on how to use Social Media signals to measure scientific impact. Elisabeth has (co-)authored 60+ peer-reviewed publications and regularly acts as reviewer and chair for major international conferences and journals. Among other courses at Graz University of Technology, Elisabeth teaches Web Science and she will start a new Complex Systems course in 2018.

Speaker: Elisabeth Lex, Graz University of Technology

As publishing has become more and more accessible and basically cost-free, virtually anyone can get their words printed, whether online or on paper. Such ease of disseminating content doesn't necessarily go together with author identifiability. In other words: it's very simple for anyone to publicly write any text, but it isn't equally simple to always tell who the author of a text is. Telling the author of a text can be thought of at various levels of detail. For example, in some contexts, and possibly in the interest of companies who want to advertise, or legal institutions, it can correspond to profiling, namely defining certain characteristics of the author, such as sex and age. In other contexts, and in the interest also of ancient and contemporary literary or historical studies, identifying authors can mean being able to tell whether two texts are likely to have been written by the same person. The latter problem can take more than one form in practice, as one could be faced with one unknown text to compare to another one written by a known author, or could be given a large number of unknown texts to be clustered according to authorship. To what extent is all this feasible? And is it meaningful?

In this talk, Malvina Nissim will discuss the specifics of such tasks, and describe a couple of systems that perform author profiling and author verification on different kinds of texts from different languages, experimenting with various linguistic and structural features. Ms. Nissim will also discuss such systems and their performance not only in terms of how they fare, but also in terms of what it means to profile and identify authors, and what challenges lie ahead for people working in this field.

Malvina Nissim is Associate Professor in Language Technology at the University of Groningen. She has extensive experience in modelling language phenomena from a computational perspective, with particular attention to sentiment analysis and author identification and profiling, especially on social media. She has (co-)authored 90+ peer-reviewed-publications, and regularly serves as reviewer/chair for major international conferences and journals. She graduated in Linguistics at the University of Pisa, and obtained her PhD in Computational Linguistics at the University of Pavia, in collaboration with the University of Edinburgh. Before joining the University of Groningen, she was a tenured researcher at the University of Bologna (2006-2014), and a post-doc at the University of Edinburgh (2001-2005), and the National Research Council in Rome (2005-2006). She is the University of Groningen's 2016 Lecturer of the Year.

Speaker: Malvina Nissim, Language Technology, University of Groningen

In recent years, social media and online social networking sites have become a major disseminator of false facts, urban legends, fake news, or, more generally, misinformation. To overcome this problem, online platforms are, on the one hand, empowering their users—the crowd—with the ability to evaluate the content they are exposed to and, on the other hand, resorting to trusted third parties for fact checking stories. However, given the noise in the evaluations provided by the crowd and the high cost of fact checking, the above mentioned measures require careful reasoning and smart algorithms. In this talk, the author will first describe a modeling framework based on marked temporal point process that links noisy evaluations provided by the crowd to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Then, the author will introduce a scalable online algorithm, CURB, to select which stories to send for fact checking and when to do so to efficiently reduce the spread of fake news and misinformation with provable guarantees. Finally, Manuel Gomez Rodriguez will show the effectiveness of his team modeling framework and their algorithm using real-world data gathered from Wikipedia, Stack Overflow, Twitter and Weibo. This talk includes joint work with Behzad Tabibian, Jooyeon Kim, Isabel Valera, Mehrdad Farajtabar, Le Song, Alice Oh and Bernhard Schoelkopf.

Manuel Gomez Rodriguez is a tenure-track faculty at Max Planck Institute for Software Systems. Manuel develops machine learning and large-scale data mining methods for the analysis, modeling and control of large social and information online systems. He is particularly interested in the creation, acquisition and/or dissemination of reliable knowledge and information, which is ubiquitous in the Web and social media, and has received several recognitions for his research, including an Outstanding Paper Award at NIPS’13 and a Best Research Paper Honorable Mention at KDD’10 and WWW’17. Manuel holds a BS in Electrical Engineering from Carlos III University in Madrid (Spain), a MS and PhD in Electrical Engineering from Stanford University, and has received postdoctoral training at the Max Planck Institute for Intelligent Systems.

Speaker: Manuel Gomez Rodriguez, Max Planck Institute for Software Systems

Online Social Networks (OSN) are increasingly being used as platform for an effective communication, to engage with other users and to create a social worth. More the number of likes, followers and shares a user receives on an OSN platform, the social self worth of the user increases. Such metrics and crowdsourced ratings give the OSN user a sense of social reputation which she tries to maintain and boost to be more influential in the network and attract more following. Users sometimes artificially bolster their social reputation via blackmarket web services and crowdsourced manipulation services. In this talk, the author will describe various approaches to detect users with manipulated social reputation. The author and her colleagues have formulated an effective method which estimates the genuine social reputation of users with manipulated social metrics. In this talk, the author will take a step further to not only detect users with manipulated social reputation, but also to predict the correct social reputation of a user. Anupama Aggarwal and her colleagues use various attack models, and show that their prediction of a user’s social reputation is tolerant against blackmarket services and crowdsourced manipulation.

Anupama Aggarwal is a PhD student at IIIT-Delhi, India and a member of Precog@IIITD. Her research focuses on study of anomalous user behavior on online social networks. In general, her interest is social computing, and data mining on social graphs to understand user behavior.

Speaker: Anupama Aggarwal, Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi)

The self-measurement boom is linked to many risks despite euphoric assessments and promises of benefit by developers, pioneers and companies. Lifelogging (the sum of all technologies and applications used for digital self-measurement) as a ‘disruptive’ technology is changing our cultural matrix and hereby the institutionalized rules of coexistence. The cultural baseline that is currently changing is the manner in which the quantifiable consumer regards something as normal and socially desirable.

Measuring humans has always been an expression of rationalisation tendencies that have social implications. Over time, these tendencies have led to a new image of humanity, which is currently experiencing an update. The modern image of society is characterised by the translation of concrete objects and complex qualitative processes into abstract quantities. Lifelogging technologies have proven themselves to be an outstanding medium for that. The measurement of man and the reduction of such to a numerical object and a mere data set is creating a negative principle of organisation of the social. Self observation based on digital data is not only becoming more exact, it is also becoming increasingly divisive. The counter term to rational differentiation is, therefore, rational discrimination. This resulting phenomenon is located as a pathology of quantification between statistical and social discrimination, and analysed in its consequences.

From the perspective of cultural anthropology, digital self measurement is nothing more than a modern-day return to the alchemistic principle. The starting point is the ‘common’ person, the human who is not yet fully developed, or the human who represents a risk or a source of error or disturbance for society. With the help of quantification, one’s lifestyle is said to become more rational. And in accordance with social standards ‘common’ people should be transformed into ‘precious’ people. The effects of this digital transformation are explained against the background of theories about convivial tools (Ivan Illich), greedy institutions (Lewis Coser) and the outsourced self (Arlie Hochschild) in a society of assistance.

Stefan Selke is professor of “Sociology and Social change” at the Furtwangen University ( in Germany. He is also a research professor for “Transformative and Public Science”. His current research interests are the economy of poverty, reputation capital in the charity market, public sociology and the digitalisation of society.

Speaker: Stefan Selke, Furtwangen University

Nowadays, music aficionados generate millions of listening events every day and share them via services such as or Twitter. In 2016, the LFM-1b dataset ( containing more than 1 billion listening events of about 120,000 users has been released to the research community and interested public. Since then, we performed various data analysis and machine learning tasks on these large amounts of user and listening data. The gained insights helped to develop new listener models and integrate them into music recommender systems, in an effort to increase personalization of the recommendations. In this talk, I will elaborate on the following research topics we have targeted in the past two years:

  • analyzing music taste around the world and distilling country clusters
  • quantifying listener and country mainstreaminess
  • music recommendation tailored to listener characteristics
  • predicting user characteristics from music listening habits
  • predicting country-specific genre preferences from cultural and socio-economic factors

Speaker: Markus Schedl, Johannes Keppler University Linz, Department of Computational Perception

Social media has brought a revolution on how people get exposed to information and how they are consuming news. Beyond the undoubtedly large number of advantages and capabilities brought by social-media platforms, a point of criticism has been the creation of filter bubbles or echo chambers, caused by social homophily as well as by algorithmic personalisation and recommendation in content delivery. In this talk, I will present the methods we developed to (i) detect and quantify the existence of polarization on social media, (ii) monitor the evolution of polarisation over time, and finally, (iii) devise methods to overcome the effects caused by increased polarization. We build on top of existing studies and ideas from social science with principles from graph theory to design algorithms which are language independent, domain agnostic and scalable to large number of users.

Kiran Garimella is a PhD student at Aalto University. His research focuses on identifying and combating polarization on social media. In general he is interested in making use of large public datasets to understand human behaviour. Prior to starting his PhD, he worked as a Research Engineer at Yahoo Research, QCRI and as an intern at Carnegie Mellon University, LinkedIn and Amazon. His work on polarization received the best student paper award at WSDM’17 and a best paper nomination at WebScience 2017.

Speaker: Kiran Garimella, Aalto University

The talk argues for the importance of forbidden triads (open triads with high weight edges) in predicting success in creative fields. Forbidden triads had been treated as a residual category beyond closed and open triads, yet we argue that they provide opportunities to combine socially evolved styles in new ways. Using data on the entire history of recorded jazz from 1896 to 2010, we show that observed collaborations have tolerated the openness of high weight triads more than expected, observed jazz sessions had more forbidden triads than expected, and the density of forbidden triads contributed to the success of recording sessions, measured by the number of releases out of the sessions’ material. The author also shows that the sessions of Miles Davis had received an especially high boost from forbidden triads.

Speaker: Balazs Vedres, Central European University

We are witnessing a momentous transformation in the way people interact and exchange information with each other. Content is now co-produced, shared, classified and rated by millions of people, while attention has become the ephemeral and valuable resource that everyone seeks to acquire. This content explosion is to a large extent driven by a mix of novel technologies with a deep human drive for recognition.

This talk will describe the regularities that govern how social attention is allocated among all media and the role it plays in the production and consumption of content. It will also describe how its dynamics not only helps determine the emergence of public agendas but also be used to predict the evolution of social trends.

Speaker: Bernardo Huberman, HP Labs and Stanford University

The many decisions people make about what information to attend to affect emerging trends, the diffusion of information in social media, and performance of crowds in peer evaluation tasks. Due to constraints of available time and cognitive resources, the ease of discovery strongly affects how people allocate their attention. Through empirical analysis and online experiments, we identify some of the cognitive heuristics that influence individual decisions to allocate attention to online content and quantify their impact on individual and collective behavior. Specifically, we show that the position of information in the user interface strongly affects whether it is seen, while explicit social signals about its popularity increase the likelihood of response. These heuristics become even more important in explaining and predicting behavior as cognitive load increases. The findings suggest that cognitive heuristics and information overload bias collective outcomes and undermine the “wisdom of crowds” effect.

Kristina Lerman is a Project Leader at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. Trained as a physicist, she now applies network- and machine learning-based methods to problems in social computing and social media analysis.

Speaker: Kristina Lerman, University of Southern California

For decades, physical behavioral labs have been a primary, yet limited, method for controlled experimental studies of human behavior. Now, software-based "virtual labs" on the Internet allow for studies of increasing complexity, size, and scope. In this talk, I highlight the potential of virtual lab experiments for studying social interaction and coordination. First, we explore collective intelligence and digital teamwork in "crisis mapping", where digital volunteers organize to assess and pinpoint damage in the aftermath of humanitarian crises. By simulating a crisis mapping scenario to study self-organization in teams of varying size, and find a tradeoff between individual effort in small groups and collective coordination in larger teams. We also conduct a study of cooperation in a social dilemma over a month of real time, using crowdsourcing participants to overcome the time constraints of behavioral labs. Our study of about 100 participants over 20 consecutive weekdays finds that a group of resilient altruists sustain a high level of cooperation across the entire population. Together, our work motivates the potential of controlled, highly instrumented studies of social interaction; the importance of behavioral experiments on longer timescales; and how open-source software both can speed up the iteration and improve the reproducibility of experimental work.

* based on joint work with Lili Dworkin, Winter Mason, Siddharth Suri, and Duncan Watts.

Andrew Mao is currently a postdoctoral researcher in Computational Social Science at Microsoft Research in NYC. His research focuses on studying collective intelligence and social interaction on the Internet, such as teamwork in online communities and coordination in crowdsourcing systems. Andrew specializes in designing and gathering data from real-time, interactive, web-based behavioral experiments, and he is the designer of TurkServer (, an open-source platform for building such experiments. His work has appeared in journals including Nature Communications and PLoS ONE as well as computer science conferences such as AAAI, EC, and HCOMP. He received his PhD from Harvard University in 2015.

Speaker: Andrew Mao, Microsoft Research NYC

Technology has advanced to a point where a large part of the population carries a mini-computer in their pockets that is disguised as a phone. Their role has changed from simple communication devices to multi-functional information devices. They are packed with various sensors, such as GPS, gyroscopes, and accelerometers, which can collect contextual data of the device and the user. Thanks to a multitude of applications they satisfy a large spectrum of different needs, and stay with their users most of their time. Smartphone usage is a hot topic in ubiquitous and pervasive computing due to their popularity and personal aspect.

The Menthal team has developed the Menthal framework ( for collecting and analyzing mobile users' data. It is part of one of the largest in-the-wild smartphone studies. They attracted a large number of participants by running the study in a start-up format: building a user desirable product and then promoting it through media outlets. From the launch of the project, in January 2014, their app has been installed more than 400,000 times and their project attracted more than 350,000 registered participants. From these, they have collected general phone measurements, such as time spent on the phone using apps or communicating, but also other interesting data such as mood/affect measurements and Big Five personality traits. The collected data allows to study a number of problems in HCI, psychological sciences and medicine. In this talk they will present their framework from a technical point of view and afterwards they will discuss past and current results from their research project.

Speaker: Ionut Andone, University of Bonn

The way we express ourselves is heavily influenced by our demographic background. I.e., we don't expect teenagers to talk the same way as retirees. Natural Language Processing (NLP) models, however, are based on a small demographic sample and approach all language as uniform. As a result, NLP models perform worse on language from demographic groups that differ from the training data, i.e., they encode a demographic bias. This bias harms performance and can disadvantage entire user groups.

Sociolinguistics has long investigated the interplay of demographic factors and language use, and it seems likely that the same factors are also present in the data we use to train NLP systems.

In this talk, I will show how we can combine statistical NLP methods and sociolinguistic theories to the benefit of both fields. I present ongoing research into large-scale statistical analysis of demographic language variation to detect factors that influence the performance (and fairness) of NLP systems, and how we can incorporate demographic information into statistical models to address both problems.

Speaker: Dirk Hovy, Computer Science department (DIKU), University of Copenhagen

With increase in usage of the Internet, there has been an exponential increase in the use of online social media on the Internet. Websites like Facebook, Google+, YouTube, Orkut, Twitter and Flickr have changed the way the Internet is being used. There is a dire need to investigate, measure, and understand privacy and security on online social media from various perspectives (computational, cultural, psychological). Real world scalable systems need to be built to detect and defend security and privacy issues on online social media. I will describe briefly some cool projects that we work on: TweetCred, OSM & Policing, OCEAN, and Call Me MayBe. Many of our research work is made available for public use through tools or online services. Our work derives techniques from Computational Social Science, Data Science, Statistics, Network Science, and Human Computer Interaction. In particular, in this talk, I will focus on the following: (1) TweetCred, a tool to extract intelligence from Twitter which can be useful to security analysts. TweetCred is backed by award-winning research publications in international and national venues. (2) How police in India are using online social media, how we can use computer science understanding to help police engage more with citizens and increase the safety in society. (3) OCEAN: Open source Collation of eGovernment data and Networks, how publicly available information on Government services can be used to profile citizens in India. This work obtained the Best Poster Award at Security and Privacy Symposium at IIT Kanpur, 2013 and it has gained a lot of traction in Indian media. (4) Given an identity in one online social media, the author interested in finding the digital foot print of the user in other social media services, this is also called digital identity stitching problem. This work is also backed by award-winning research publication.

Speaker: Ponnurangam Kumaraguru, Indraprastha Institute of Information Technology (IIIT), Delhi, India

Research into socio-technical systems like Wikipedia has overlooked important structural patterns in the coordination of distributed work. This paper argues for a conceptual reorientation towards sequences as a fundamental unit of analysis for understanding work routines in online knowledge collaboration. I outline a research agenda for computational social science researchers to understand the relationships, patterns, antecedents, and consequences of sequential behavior extending methods already developed in fields like sociology and bio-informatics. Using a data set of 37,515 revisions from 16,616 unique editors to 96 Wikipedia articles as a case study, we analyze the prevalence and significance of different sequences of editing patterns. We illustrate the mixed method potential of sequence approaches by interpreting the frequent patterns as general classes of behavioral motifs. We conclude by discussing the methodological opportunities for using sequence analysis for expanding existing approaches to analyzing and theorizing about co-production routines in online knowledge collaboration.

Speaker: Brian Keegan, Harvard Business School

Determining the relative centrality of actors, or the degree to which they are structurally important, is a most common technique in social network analysis. Many indices have been proposed to measure a variety of centrality conceptions, and choosing one that is most appropriate for the particular research question and data at hand proves a challenge in many empirical studies. We use a general result about all common centrality indices to motivate a re-conceptualization of centrality. Our new approach is the first instantiation of a recently introduced positional framework for network analysis. By breaking down complex analytical into comprehensible steps, multivariate data and theoretical assumptions can be integrated more flexibly. Several examples serve to illustrate this point.

Speaker: Ulrik Brandes, University of Konstanz

Characterising how we explore abstract spaces is key to understand our (ir)rational behaviour and decision making. While some light has been shed on the navigation of semantic networks, however, little is known about the mental exploration of metric spaces, such as the one dimensional line of numbers, prices, etc. Here we address this issue by investigating the behaviour of users exploring the “bid space” in online auctions. We find that they systematically perform Lévy flights, i.e., random walks whose step lengths follow a power-law distribution. Interestingly, this is the best strategy that can be adopted by a random searcher looking for a target in an unknown environment, and has been observed in the foraging patterns of many species. In the case of online auctions, we measure the power-law scaling over several decades, providing the neatest observation of Lévy flights reported so far. We also show that the histogram describing single individual exponents is well peaked, pointing out the existence of an almost universal behaviour. Furthermore, a simple model reveals that the observed exponents are nearly optimal, and represent a Nash equilibrium. We rationalise these findings through a simple evolutionary process, showing that the observed behaviour is robust against invasion of alternative strategies. Our results show that humans share with other animals universal patterns in general searching processes, and raise fundamental issues in cognitive, behavioural and evolutional sciences.

Speaker: Andrea Baronchelli, City University London

Human mobility has been a hot topic of interest for researchers due to its importance for many application scenarios that include nearby place search, mobile context awareness or mobile advertising.  Despite the bevy of research on human mobility patterns analysis and prediction modeling of individual users, however, little attention has been put on the mobility patterns of user collectives across places in a city. In this talk, we will exploit network analysis techniques to view human movement in urban environments from the perspective of an aggregate networked system where nodes are Foursquare venues. We will discuss the geometric properties of place networks in a large number of metropolitan areas around the world and how those compare to other well studied types of networks, such as on-line social networks or the web.

Next, we will shed light on the growth patterns of place networks in terms of node and edge generation processes. Motivated by the fact that a large number of new links is emerging over time in those networks, we will define a link prediction task in this novel application domain with the aim to predict future interactions between Foursquare venues. The talk will close by providing a head to head comparison over the prediction task amongst the well-known, in human mobility literature, gravity models, network-based techniques as well as supervised learning algorithms.

Speaker: Anastasios Noulas, University of Cambridge

In the era of big data and social media analysis, as a way forward, I propose an alternative to vanity metrics or the quantification of trend and personal influence. Rather, for the study of Twitter, Facebook and other secondary social media, I would like to put forward a critical data analytics that is sensitive to big data critique on the one hand and embraces analytical strategies with digital methods based on expertise and engagement on the other hand, making findings and outputting visualisations which are both insightful for (ethical) social research and aware of the hegemony of the graph.

Richard Rogers is Department Chair of Media Studies and Professor of New Media and Digital Culture at the University of Amsterdam. He is author most recently of Digital Methods (MIT Press, 2013), winner of the ICA outstanding book award, and Issue Mapping for an Ageing Europe (Amsterdam University Press, 2015), with Natalia Sanchez and Aleksandra Kil. He is Director of the Digital Methods Initiative and the Foundation, known for online mapping tools such as the Issue Crawler and the Lippmannian Device. He has received research grants from the Ford Foundation, Gates Foundation, MacArthur Foundation, Open Society Institute and Soros Foundation, and has worked with such NGOs as Greenpeace International, Human Rights Watch, Association for Progressive Communications, Women on Waves, Carbon Trade Watch and Corporate Observatory Europe.

Speaker: Richard Rogers, Digital Methods Initiative, University of Amsterdam

Traditionally, most of football statistical and media coverage has been focused almost exclusively on goals and (ocassionally) shots. However, most of the duration of a football game is spent away from the boxes, passing the ball around. The way teams pass the ball around is the most characteristic measurement of what a team’s “unique style” is. In this talk we will showcase how the study of a passing network keeps track of the team’s playing style, and how network invariants such as PageRank provide an adequate measurement for players involvement. Next, we will proceed further into the analysis of passing sequences, what are their likely outcomes, and how the passing patters allow us to construct a “digital fingerprint” of a player’s style.

Speaker: Javier López Peña, University College London

The convergence of social and technical systems provides us with a wealth of data on the structure and dynamics of social organizations. It is tempting to utilize these data in order to better understand how social organizations evolve, how their structure is related to their "success", and how the position of individuals in the emerging social fabric affects their performance and motivation. Taking a complex network perspective on these questions, in this talk I will introduce recent research results obtained in the context of collaborative software engineering. These results demonstrate the potential of network-based data mining methods in the study of social organizations. At the same time, I will highlight fallacies arising in the application of the complex networks perspective to social systems.

Speaker: Ingo Scholtes, ETH Zürich

It has become popular to tap into the "intelligence of the crowd" on the Internet. This talk argues that more often than not, the crowd flips from intelligence to madness, showing more characteristics of football hooligans than complex problem solving behavior. This is in contrast to what I call "creative swarms", where small teams of intrinsically motivated people work together in Collaborative Innovation Networks (COINs) to invent something radically new. The key difference is in motivation: crowds are motivated by money, power and glory, while swarms are intrinsically motivated by the problems they are trying to solve.

The talk introduces a collaboration scorecard made up of six key variables – “honest signals” – indicative of creative swarms. The variables are computed by analyzing global communication on the Web, in Twitter, and Wikipedia, in organizations through e-mail, and in small teams through sociometric badges.

The talk is illustrated by many examples, with emphasis on high-tech firms and healthcare. For instance, it illustrates how customer satisfaction and employee attrition is predicted in a large Indian outsourcing company by analyzing the company’s e-mail archive. It also introduces the Chronic Collaborative Care Network (C3N) at Cincinnati Children's Hospital, where COINs of medical researchers, physicians, patients and their families are working together to improve the lives of patients with Crohn's disease, diabetes, and cystic fibrosis. Analyzing the e-mail archive of the C3N innovation teams and providing them with a process called “virtual mirroring” where the communication behavior of creative teams is mirrored back, helps them to increase creativity by improved communication.

Speaker: Peter A. Gloor, MIT's Sloan School for Management

Query-specific Wikipedia Construction

We all turn towards Wikipedia with questions we want to know more about, but eventually find ourselves on the limit of its coverage. Instead of providing "ten blue links" like common in Web search, my goal is to answer any web query with something that looks and feels like Wikipedia. I am developing algorithms to automatically retrieve, extract, and compile a knowledge resource for a given web query.  I will talk about a supervised retrieval model that can jointly identify relevant Web documents, Wikipedia entities, and extract support passages.

Network Topic Models

Topic models such as Latent Dirichlet Allocation are an unsupervised technique to extract word clusters with topical character from a given corpus of text documents. Often we find text documents with an underlying link structure, or a network in which nodes are associated with text content. It is often assumed that connected nodes have some shared trait or interest which motivated the forming of the connection. In this talk, I will discuss several topic model extensions for textual network data. This includes the Citation Influence Model [4] which quantifies the strengths of a citation strength in an acyclic graph through a topic model. Furthermore, I will discuss the Shared Taste Model [5] which learns topics that capture shared interests in an undirected social network. As communication between users is often off-limits due to privacy concerns, the model learns from public text written by users, such as tweets, tags, posts, etc. The goal is to predict which friend of the user is interested in the content. The source code for both models is available on Github.

Speaker: Laura Dietz, Center for Intelligent Information Retrieval (CIIR) at University of Massachusetts

Recent work: Improving Website Hyperlink Structure Using Server Logs

Good websites should be easy to navigate via hyperlinks, yet maintaining a link structure of high quality is difficult. Identifying pairs of pages that should be linked may be hard for human editors, especially if the site is large and changes are frequent. To support human editors, we develop an approach for automatically finding useful hyperlinks to add to a website. We show that passively collected server logs, beyond telling us which existing links are useful, also contain implicit signals indicating which nonexistent links would be useful if they were to be introduced. We leverage these signals to model the future usefulness of as yet nonexistent links. Based on our model, we define the problem of link placement under budget constraints and propose an efficient algorithm for solving it. We demonstrate the effectiveness of our approach by evaluating it on Wikipedia and (Joint work with Ashwin Paranjape and Jure Leskovec of Stanford, and Leila Zia of Wikimedia)

Ongoing work: Media Coverage of Death 

Death is an inevitable fact of the human condition and as such draws much attention. The deaths of famous people tend to be widely covered by the media in the form of obituaries and news articles, and may lead to a sustained change in the way their lives are collectively remembered. In this work in progress, we ask the question how deceased famous people are remembered by the media. To shed light on this question, we identify a set of notable people deceased during the six years from 2008 to 2014 and track them in a large corpus of news articles and blog posts spanning the entire six-year period. Our results show that death generally has a profound impact on how people are perceived by the media. Further, we find that posthumous media coverage varies with the circumstances of death and the biographic background of the deceased. (Joint work with Jure Leskovec and Christopher Potts of Stanford)

Speaker: Robert West, InfoLab at Stanford University

At the beginning of 2014, as an answer to the growing concerns about the role played by data mining/machine learning algorithms in decision-making, USA President Obama called for a 90-day review of big data collecting and analysing practices. The resulting report concluded that “big data technologies can cause societal harms beyond damages to privacy”. In particular, it expressed concerns about the possibility that decisions informed by big data could have discriminatory effects, even in the absence of discriminatory intent, further imposing less favorable treatment to already disadvantaged groups. In its recommendations to the President, the report called for additional "technical expertise to stop discrimination", and for further research into the dangers of "encoding discrimination in automated decisions".
In parallel to development in anti-discrimination legislation, efforts at fighting discrimination have led to developing anti-discrimination techniques in data mining. Some proposals are oriented to the discovery and measurement of discrimination, while others deal with preventing data mining from becoming itself a source of discrimination, due to automated decision making based on discriminatory models extracted from inherently biased datasets. In this talk, I will introduce some of the recent techniques for discrimination prevention, simultaneous discrimination and privacy protection, and discrimination discovery and show some recent results.

Speaker: Sara Hajian, Eurecat-Technology center of Catalonia

Due to the low acquisition cost and its sheer scale, social media is becoming a popular data source for studies on tracking health trends at scale. These studies usually take a population-centric approach where their “ground truth” used for validation and model fitting is derived either from time series data, e.g. from temporal influenza activity, or from geo-graphically varied data, e.g. from county-level obesity rates in the US. In this talk, I will present recent and ongoing work that uses social media data to study lifestyle diseases such as obesity. The first line of work takes the population-centric approach and uses food mentions on Twitter to study obesity. I will then move on towards individual-centric health studies of obesity and dieting. Such a fine-grained level analysis is made possible by (i) labeling individual users as “is overweight or not” using their profile pictures, and (ii) analyzing users whose internet-enabled smart scales tweet their weight. I’ll conclude by outlining a vision of how social media data, physical sensor data, and electronic health records could be combined in a clinical setting to provide a more holistic view on a patient’s health.

Speaker: Ingmar Weber, Qatar Computing Research Institute

Prominent data scientists have declared “the end of theory” in the era of big data. I argue that it is rather the beginning, due to new opportunities for a relational theoretical framework. From astronomy to neuroscience to particle physics, scientific knowledge depends decisively on the available tools for observation. For the past century (or more), the survey has been the single most important observational tool for social science. During this time, enormous advances have taken place in our ability to reduce systematic bias in sampling hidden populations, to reduce measurement error in the responses to survey items, to reduce statistical error in the analysis of results, and to reduce inferential error in causal models of the associations among the measures. Nevertheless, increasing confidence in survey technology has paradoxically reinforced a debilitating theoretical blinder that has compromised the ability of social science to elicit confidence in predictions. What is worse, this blinder has largely escaped notice through a combination of ideological bias and reluctance to pull back the covers on problems for which we have no solution. The good news is that a solution is finally on the horizon.

Speaker: Michael Macy, Goldwin Smith Professor of Arts and Sciences and Director of the Social Dynamics Laboratory, Cornell University

The personal stories that people post to their public weblogs offer a glimpse into the everyday lives of people. In this talk I will discuss our efforts to automatically gather tens of millions of these stories, and use them as a dataset for investigating different populations of authors. I will discuss our work on analyzing the stories that people tell about health emergencies (strokes), and how this led us to concerns about sample bias. I will describe our ongoing work on bias correction for social-media samples, and discuss opportunities afforded by populations of extremely prolific webloggers whose demographic information can be readily extracted from the stories they share.

Speaker: Andrew S. Gordon, Institute for Creative Technologies, University of Southern California

With the widespread adoption of social media sites like Twitter and Facebook, there has been a shift in the way information is produced and consumed in our societies. Traditionally, information was produced by large news organizations, which broadcast the same carefully-edited information to all consumers over mass media channels. In contrast, in online social media, any user can be a producer of information, and every user selects which other users she connects to, thereby choosing the information she consumes. Furthermore, recommender systems deployed on most social media sites provide users with additional information that is tailored to their individual tastes.

In this talk, I will introduce the concept of information diet – which is the topical or distribution of a given set of information items (e.g., tweets) – to characterize the information produced and consumed by various types of users in the popular Twitter social media. At a high level, we find that (i) popular users mostly produce very specialized diets focusing on only a few topics; in fact, news organizations (e.g., NYTimes) produce much more focused diets on social media as compared to their mass media diets, (ii) most users’ consumption diets are primarily focused towards one or two topics of their interest, and (iii) the personalized recommendations provided by Twitter help to mitigate some of the topical imbalances in the users’ consumption diets, by adding information on diverse topics apart from the users’ primary topics of interest.

Speaker: Krishna Gummadi, Max Planck Institute for Software Systems

An der Leibniz Universität Hannover wurde eine interdisziplinäre Studie zur Repräsentation von Street Art in Flickr durchgeführt. An der Untersuchung waren ein Soziologe, zwei Informatiker und mehrere Hilfskräfte beteiligt. Ausgangspunkt des Forschungsvorhabens war die These von Ulf Wuggenig, dass sich mit dem Internet auch die Art und Weise der Rezeption und Wahrnehmung von Street Art verändere. Anstatt etablierter Kunstinstitutionen würde vor allem das Internet dazu beitragen, die Repräsentation und Anerkennungsprozesse von Street Artists zu befördern. Dazu wurde eine visuelle Inhaltsanalyse durchgeführt. Das Forschungsprojekt kam zu dem Ergebnis, dass in Flickr eine Street Art-"orientierte" Community existiert und für die Repräsentation von Street Art sorgt. Ich werde daher in meinem Vortrag auf die Möglichkeiten eingehen, anhand verfügbarer Metadaten die Community zu beschreiben und wie Street Art in Flickr repräsentiert wird. In diesem Zusammenhang gehe ich auch auf weiterführende Fragestellungen ein, die sich für mich aus der Forschung ergeben haben.

Speaker: Axel Philipps, Leibniz University of Hannover

Wikipedia is a huge global repository of human knowledge, and at the same time one of the largest experiments of online collaboration. Its articles, their links and the negotiations around their content tend to reflect societal debates in different language communities. The first work David Laniado will present is Contropedia, a platform that adds a layer of transparency to Wikipedia articles. Combining activity from the edit history and discussion in talk pages, the platform uses wiki links as focal points to explore the development of controversial issues over time. The second study focuses on the network of hyperlinks in different language editions of Wikipedia. A ranking of the most central biographies in each language edition is used to study relationships and influences between cultures. Finally, the speaker will present a large-scale analysis of emotional expression and communication style of editors in Wikipedia discussions, focusing on how emotion and dialogue differ depending on the editors' status, gender, and communication network.

Speaker: David Laniado, Barcelona Media

Modeling social media and blogging is a fashionable topic of research. The challenges include how to model, exploit, store, and analyze social content data. In this talk the speaker will discuss about novel approaches on how to exploit such data in order to build: (1) new types of recommender systems; (2) discover and understand the evolution of topic over time.  

Speaker: Puya Hossein Vahabi, Yahoo Labs Barcelona

Since 1970s urban theories proposed by Lynch and Milgram aimed at understanding complex city dynamics. Can these theories be put to use for enabling new mobile services? The answer is a definitive “Yes!”. Existing mapping technologies return shortest directions. To complement them, we are designing new mobile phone tools that return directions that not only are short but also make the experience of pedestrians happier. To capture a fuzzy concept such as happiness, we have combined Flickr metadata with the crowdsourcing site, which I co-designed with colleagues at the University of Cambridge. This crowdsources visual perceptions of quiet, beauty and happiness across the city of London using pictures of street scenes. 

Speaker: Daniele Quercia, Yahoo Labs Barcelona

The problem of understanding the dynamics of collective attention has been identified as a key scientific challenge for the information age. In this talk, we first show that search behaviors of large populations of Internet users evolve in a highly regular manner and that corresponding time series can be modeled using skewed distributions. We then ask if such dynamics could be explained in terms of infectious processes that take place in social networks and derive a physically plausible, model for the temporal dynamics of graph diffusion processes. Our results are based on maximum entropy arguments and provide new approaches to problems in network analysis and mining.

Speaker: Christian Bauckhage, Fraunhofer IAIS

Online participatory media, such as social networking sites and forums, changed Internet users from simple information consumers to active producers of online content, turning our societies into "digital democracies". As part of a Swiss SNF funded project, we explore the dynamics of polarization of opinions and social structures through the digital traces left by politicians and voters in online participatory media.

Our first study focuses on Politnetz, a Swiss platform focused on political activity, composed of support links between politicians, comments, and likes.  We analyzed network polarization as the level of intra-party cohesion with respect to inter-party cohesion, finding that support show a very strongly polarized structure with respect to party alignment.  We found that comment structures follow topics related to Swiss politics, and that polarization in likes evolves in time, increasing when the federal elections of 2011 were close. Furthermore, we analyzed the internal social structure of each party through social network metrics related to hierarchical structures and information efficiency.  This analysis highlights patters of the relation between the connectivity patterns of parties and their political position within a multi-party system. Our second work analyzes the evolution of the 15M movement through its digital traces in the Twitter social network.  We analyzed the tweets related to the movement during 30 days around its creation, providing an illustration of the evolution and structure of the movement at its collective and individual level.  We found patterns of influence of collective action and mass media in the polarization of opinions about the movement, and found different stages of movement formation and expansion through Twitter activity. Our sentiment and psycholinguistic analysis of the content of tweets reveals that activity cascades with strong negative sentiment and social-related terms spread to larger amounts of users.  At the individual level, we found that users that are more embedded in the movement display higher levels of activity and express stronger negativity, in line with the overall negative context of the movement.

Speaker: David Garcia, ETH Zürich

Though online social network research has exploded during the past years, not much thought has been given to the exploration of the nature of the social structures that compose them. Online interactions have been interpreted as indicative of one social process or another (e.g., status exchange or trust), often with little systematic justification regarding the relation between observed data and theoretical concept. Our research aims to breach this gap in computational social science by trying to explain the nature and purpose of social structures, with quantitative metrics that are directly derived from longstanding concepts in social sciences. In this talk we will discuss about characterization of social links and social groups. We propose a method based on Blau's notion of resource exchange that discovers, with high accuracy, the fundamental domains of interaction occurring over links in social networks. By applying this method on two online datasets different by scope and type of interaction (aNobii and Flickr) we observe the spontaneous emergence of three domains of interaction representing the exchange of status, knowledge and social support. By finding significant relations between the domains of interaction and classic social network analysis issues (e.g., tie strength, dyadic interaction over time) we show how the network of interactions induced by the extracted domains can be used as a starting point for more nuanced analysis of online social data that may one day incorporate the normative grammar of social interaction. Also, we explore the nature of online groups through the lens of the common identity and common bond theory, defining a set of features to classify groups into those two categories. We show that the classification works with high accuracy on Flickr groups.

Speaker: Luca Maria Aiello, Yahoo Labs Barcelona

Wikipedia is the largest encyclopaedia in the world and seeks to "create a summary of all human knowledge". An encyclopaedia is supposed to contain a collection of objective facts, reported by secondary sources. However, the crowdsourced nature of Wikipedia makes it a source of information by itself reflecting the interests, preferences, opinions, and priorities of the members of its community of editors. By analysing the editorial conflicts between the editors of different language edition, we can create interesting images of each language community interests and concerns. Moreover, the page view statistics of Wikipedia articles, provide a unique insight to the patterns of information seeking by its readers. In this presentation, we start by Wikipedia edit wars and discuss what we could learn from the warring patterns about our real life facts, and then  three examples are shown, in each of them statistics of editorial activities and page views are considered as proxies to assess popularity and visibility of items. Movie market, election, and scientific reputation are the three topics we have investigated and observed under certain conditions, there is a high correlation between popularity and Wikipedia edits and page views volumes. Based on these correlations and in the presence of external data to calibrate a predictive model, one is able to forecast the prospective success of an item in a reasonably accurate way.

Speaker: Taha Yasseri, Oxford Internet Institute

FuturlCT is a global initiative pursuing a participatory approach, integrated across the fields of ICT, the social sciences and complexity science, to design socio-inspired technology and develop a science of global, socially interactive systems. The initiative wants to bring together, on a global level, Big Data, new modelling techniques and new forms of interaction, leading to a new understanding of society and its co-evolution with technology. The goal is to create a major scientific drive to understand, explore and manage our complex, connected world in a more sustainable and resilient manner.

The initiative is motivated by the fact that ubiquitous communication and sensing blur the boundaries between the physical and digital worlds, creating unparalleled opportunities for understanding the socio-economic fabric of our world, and for empowering humanity to make informed, responsible decisions for its future. The intimate, complex and dynamic relationship between global, networked ICT systems and human society directly influences the complexity and manageability of both. This also opens up the possibility to fundamentally change the way ICT will be designed, built and operated, reflecting the need for socially interactive, ethically sensitive, trustworthy, self-organized and reliable systems.

It is planned to build a new public resource - value-oriented tools and models to aggregate, access, query and understand vast amounts of data. Information from open sources, real-time devices and mobile sensors would be integrated with multi-scale models of the behaviour of social, technological, environmental and economic systems, which could be interrogated by policy-makers, business people and citizens alike. Together, these would build an eco-system leading to new business models, scientific paradigm shifts and more rapid and effective ways to create and disseminate new knowledge and social benefits - thereby forming an innovation accelerator.

Speaker: Dirk Helbing, ETH Zürich

The increasing availability of data across different socio-technical systems, such as online social media, mobile phone networks, and collaborative knowledge platforms, presents novel challenges and intriguing research opportunities. As more online services permeate through our everyday life and as data from various domains are connected and integrated with each other, the boundary between ‘real-world’ and ‘virtual online world’ becomes blurry. Scholars from different fields have now rich sources of information on individual behaviors at a scale that only a decade ago was hardly conceivable. Such data cover both online and offline activities of people, as well as multiple time scales, prompting a variety of research questions on human behaviors and activities in the real and online worlds. In this talk I will discuss two examples of how online and offline worlds interact and affect each other. In the first case, I'll show how online conversation on Twitter triggers and responds to real worlds events in the context of the Occupy Wall Street social mobilization. In turn, the second example will illustrate how human mobility affects topics of discussion on such online platforms: I'll draw a parallel between information diffusion and epidemics spreading, showing that the dynamics driving the emergence of collective attention and trends are tightly interconnected with individuals mobility in the real world.

Speaker: Emilio Ferrara, Indiana University Bloomington


Breaking out of the Echo Chamber: Understanding and Designing for Cross-ideology Discussions

Q. Vera Liao, University of Illinois at Urbana-Champaign
December 11, 2013

Twitter data as information source on political communication and political campaigns: Examples from Germany

Andreas Jungherr, University of Konstanz
November 27, 2013

PoliMedia − Connecting Political and Media Data

Laura Hollink, Centrum Wiskunde & Informatica
October 14, 2013