Das GESIS Computational Social Science (CSS) Seminar ist eine englischsprachige monatliche Veranstaltung zum Expertenaustausch rund um die Themen Data Science und Social Analytics.
Recommender systems have become increasingly pervasive in our daily lives to support us in identifying relevant content in an overloaded information space. Much of the research in the Recommender Systems community has focused on building (mostly data-driven) recommendation models, which make strong and sometimes too simplified assumptions about human behavior and preferences. In this talk, Elisabeth Lex will show how psychological insights can be used to develop new recommender algorithms, which better reflect and predict user behaviour. First, a hashtag recommendation algorithm is introduced that mimics how people access information in their long-term memory. Ms. Lex and her colleagues found that temporal effects play a strong role in hashtag usage. Second, a computational model of human category learning is used to improve Collaborative Filtering by incorporating non-linear user-resource dynamics. Finally, it will be discussed the recent work of Elisabeth Lex and her colleagues on echo chambers and algorithmic fairness in recommender systems.
Assist.-Prof. Dr. Elisabeth Lex is assistant professor at Graz University of Technology and she heads the Social Computing research area at Know-Center, Austria's Research Center for Data-driven Business and Big Data Analytics. Her research interests include Recommender Systems, Social Network Analysis, Data Science, and Open Science. Elisabeth has been work package leader in the FP7 IP Learning Layers project, scientific coordinator of the Marie Curie IRSES Web Information Quality Evaluation Initiative (WIQ-EI) project, and task leader in the H2020 Analytics for Everyday Learning (AFEL) project. Recently, Elisabeth has been member of the Expert Group on Altmetrics, which advised the European Commission, DG Research and Innovation on how to use Social Media signals to measure scientific impact. Elisabeth has (co-)authored 60+ peer-reviewed publications and regularly acts as reviewer and chair for major international conferences and journals. Among other courses at Graz University of Technology, Elisabeth teaches Web Science and she will start a new Complex Systems course in 2018.
Speaker: Elisabeth Lex, Graz University of Technology
As publishing has become more and more accessible and basically cost-free, virtually anyone can get their words printed, whether online or on paper. Such ease of disseminating content doesn't necessarily go together with author identifiability. In other words: it's very simple for anyone to publicly write any text, but it isn't equally simple to always tell who the author of a text is. Telling the author of a text can be thought of at various levels of detail. For example, in some contexts, and possibly in the interest of companies who want to advertise, or legal institutions, it can correspond to profiling, namely defining certain characteristics of the author, such as sex and age. In other contexts, and in the interest also of ancient and contemporary literary or historical studies, identifying authors can mean being able to tell whether two texts are likely to have been written by the same person. The latter problem can take more than one form in practice, as one could be faced with one unknown text to compare to another one written by a known author, or could be given a large number of unknown texts to be clustered according to authorship. To what extent is all this feasible? And is it meaningful?
In this talk, Malvina Nissim will discuss the specifics of such tasks, and describe a couple of systems that perform author profiling and author verification on different kinds of texts from different languages, experimenting with various linguistic and structural features. Ms. Nissim will also discuss such systems and their performance not only in terms of how they fare, but also in terms of what it means to profile and identify authors, and what challenges lie ahead for people working in this field.
Malvina Nissim is Associate Professor in Language Technology at the University of Groningen. She has extensive experience in modelling language phenomena from a computational perspective, with particular attention to sentiment analysis and author identification and profiling, especially on social media. She has (co-)authored 90+ peer-reviewed-publications, and regularly serves as reviewer/chair for major international conferences and journals. She graduated in Linguistics at the University of Pisa, and obtained her PhD in Computational Linguistics at the University of Pavia, in collaboration with the University of Edinburgh. Before joining the University of Groningen, she was a tenured researcher at the University of Bologna (2006-2014), and a post-doc at the University of Edinburgh (2001-2005), and the National Research Council in Rome (2005-2006). She is the University of Groningen's 2016 Lecturer of the Year.
Speaker: Malvina Nissim, Language Technology, University of Groningen
In recent years, social media and online social networking sites have become a major disseminator of false facts, urban legends, fake news, or, more generally, misinformation. To overcome this problem, online platforms are, on the one hand, empowering their users—the crowd—with the ability to evaluate the content they are exposed to and, on the other hand, resorting to trusted third parties for fact checking stories. However, given the noise in the evaluations provided by the crowd and the high cost of fact checking, the above mentioned measures require careful reasoning and smart algorithms. In this talk, the author will first describe a modeling framework based on marked temporal point process that links noisy evaluations provided by the crowd to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Then, the author will introduce a scalable online algorithm, CURB, to select which stories to send for fact checking and when to do so to efficiently reduce the spread of fake news and misinformation with provable guarantees. Finally, Manuel Gomez Rodriguez will show the effectiveness of his team modeling framework and their algorithm using real-world data gathered from Wikipedia, Stack Overflow, Twitter and Weibo. This talk includes joint work with Behzad Tabibian, Jooyeon Kim, Isabel Valera, Mehrdad Farajtabar, Le Song, Alice Oh and Bernhard Schoelkopf.
Manuel Gomez Rodriguez is a tenure-track faculty at Max Planck Institute for Software Systems. Manuel develops machine learning and large-scale data mining methods for the analysis, modeling and control of large social and information online systems. He is particularly interested in the creation, acquisition and/or dissemination of reliable knowledge and information, which is ubiquitous in the Web and social media, and has received several recognitions for his research, including an Outstanding Paper Award at NIPS’13 and a Best Research Paper Honorable Mention at KDD’10 and WWW’17. Manuel holds a BS in Electrical Engineering from Carlos III University in Madrid (Spain), a MS and PhD in Electrical Engineering from Stanford University, and has received postdoctoral training at the Max Planck Institute for Intelligent Systems. You can find more about him at http://learning.mpi-sws.org.
Speaker: Manuel Gomez Rodriguez, Max Planck Institute for Software Systems
Online Social Networks (OSN) are increasingly being used as platform for an effective communication, to engage with other users and to create a social worth. More the number of likes, followers and shares a user receives on an OSN platform, the social self worth of the user increases. Such metrics and crowdsourced ratings give the OSN user a sense of social reputation which she tries to maintain and boost to be more influential in the network and attract more following. Users sometimes artificially bolster their social reputation via blackmarket web services and crowdsourced manipulation services. In this talk, the author will describe various approaches to detect users with manipulated social reputation. The author and her colleagues have formulated an effective method which estimates the genuine social reputation of users with manipulated social metrics. In this talk, the author will take a step further to not only detect users with manipulated social reputation, but also to predict the correct social reputation of a user. Anupama Aggarwal and her colleagues use various attack models, and show that their prediction of a user’s social reputation is tolerant against blackmarket services and crowdsourced manipulation.
Anupama Aggarwal is a PhD student at IIIT-Delhi, India and a member of Precog@IIITD. Her research focuses on study of anomalous user behavior on online social networks. In general, her interest is social computing, and data mining on social graphs to understand user behavior.
Speaker: Anupama Aggarwal, Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi)
The self-measurement boom is linked to many risks despite euphoric assessments and promises of benefit by developers, pioneers and companies. Lifelogging (the sum of all technologies and applications used for digital self-measurement) as a ‘disruptive’ technology is changing our cultural matrix and hereby the institutionalized rules of coexistence. The cultural baseline that is currently changing is the manner in which the quantifiable consumer regards something as normal and socially desirable.
Measuring humans has always been an expression of rationalisation tendencies that have social implications. Over time, these tendencies have led to a new image of humanity, which is currently experiencing an update. The modern image of society is characterised by the translation of concrete objects and complex qualitative processes into abstract quantities. Lifelogging technologies have proven themselves to be an outstanding medium for that. The measurement of man and the reduction of such to a numerical object and a mere data set is creating a negative principle of organisation of the social. Self observation based on digital data is not only becoming more exact, it is also becoming increasingly divisive. The counter term to rational differentiation is, therefore, rational discrimination. This resulting phenomenon is located as a pathology of quantification between statistical and social discrimination, and analysed in its consequences.
From the perspective of cultural anthropology, digital self measurement is nothing more than a modern-day return to the alchemistic principle. The starting point is the ‘common’ person, the human who is not yet fully developed, or the human who represents a risk or a source of error or disturbance for society. With the help of quantification, one’s lifestyle is said to become more rational. And in accordance with social standards ‘common’ people should be transformed into ‘precious’ people. The effects of this digital transformation are explained against the background of theories about convivial tools (Ivan Illich), greedy institutions (Lewis Coser) and the outsourced self (Arlie Hochschild) in a society of assistance.
Stefan Selke is professor of “Sociology and Social change” at the Furtwangen University (http://en.hs-furtwangen.de) in Germany. He is also a research professor for “Transformative and Public Science”. His current research interests are the economy of poverty, reputation capital in the charity market, public sociology and the digitalisation of society.
Speaker: Stefan Selke, Furtwangen University
Nowadays, music aficionados generate millions of listening events every day and share them via services such as Last.fm or Twitter. In 2016, the LFM-1b dataset (http://www.cp.jku.at/datasets/LFM-1b) containing more than 1 billion listening events of about 120,000 Last.fm users has been released to the research community and interested public. Since then, we performed various data analysis and machine learning tasks on these large amounts of user and listening data. The gained insights helped to develop new listener models and integrate them into music recommender systems, in an effort to increase personalization of the recommendations. In this talk, I will elaborate on the following research topics we have targeted in the past two years:
Speaker: Markus Schedl, Johannes Keppler University Linz, Department of Computational Perception
Social media has brought a revolution on how people get exposed to information and how they are consuming news. Beyond the undoubtedly large number of advantages and capabilities brought by social-media platforms, a point of criticism has been the creation of filter bubbles or echo chambers, caused by social homophily as well as by algorithmic personalisation and recommendation in content delivery. In this talk, I will present the methods we developed to (i) detect and quantify the existence of polarization on social media, (ii) monitor the evolution of polarisation over time, and finally, (iii) devise methods to overcome the effects caused by increased polarization. We build on top of existing studies and ideas from social science with principles from graph theory to design algorithms which are language independent, domain agnostic and scalable to large number of users.
Kiran Garimella is a PhD student at Aalto University. His research focuses on identifying and combating polarization on social media. In general he is interested in making use of large public datasets to understand human behaviour. Prior to starting his PhD, he worked as a Research Engineer at Yahoo Research, QCRI and as an intern at Carnegie Mellon University, LinkedIn and Amazon. His work on polarization received the best student paper award at WSDM’17 and a best paper nomination at WebScience 2017.
Speaker: Kiran Garimella, Aalto University
The talk argues for the importance of forbidden triads (open triads with high weight edges) in predicting success in creative fields. Forbidden triads had been treated as a residual category beyond closed and open triads, yet we argue that they provide opportunities to combine socially evolved styles in new ways. Using data on the entire history of recorded jazz from 1896 to 2010, we show that observed collaborations have tolerated the openness of high weight triads more than expected, observed jazz sessions had more forbidden triads than expected, and the density of forbidden triads contributed to the success of recording sessions, measured by the number of releases out of the sessions’ material. The author also shows that the sessions of Miles Davis had received an especially high boost from forbidden triads.
Speaker: Balazs Vedres, Central European University
We are witnessing a momentous transformation in the way people interact and exchange information with each other. Content is now co-produced, shared, classified and rated by millions of people, while attention has become the ephemeral and valuable resource that everyone seeks to acquire. This content explosion is to a large extent driven by a mix of novel technologies with a deep human drive for recognition.
This talk will describe the regularities that govern how social attention is allocated among all media and the role it plays in the production and consumption of content. It will also describe how its dynamics not only helps determine the emergence of public agendas but also be used to predict the evolution of social trends.
Speaker: Bernardo Huberman, HP Labs and Stanford University
The many decisions people make about what information to attend to affect emerging trends, the diffusion of information in social media, and performance of crowds in peer evaluation tasks. Due to constraints of available time and cognitive resources, the ease of discovery strongly affects how people allocate their attention. Through empirical analysis and online experiments, we identify some of the cognitive heuristics that influence individual decisions to allocate attention to online content and quantify their impact on individual and collective behavior. Specifically, we show that the position of information in the user interface strongly affects whether it is seen, while explicit social signals about its popularity increase the likelihood of response. These heuristics become even more important in explaining and predicting behavior as cognitive load increases. The findings suggest that cognitive heuristics and information overload bias collective outcomes and undermine the “wisdom of crowds” effect.
Kristina Lerman is a Project Leader at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. Trained as a physicist, she now applies network- and machine learning-based methods to problems in social computing and social media analysis.
Speaker: Kristina Lerman, University of Southern California
For decades, physical behavioral labs have been a primary, yet limited, method for controlled experimental studies of human behavior. Now, software-based "virtual labs" on the Internet allow for studies of increasing complexity, size, and scope. In this talk, I highlight the potential of virtual lab experiments for studying social interaction and coordination. First, we explore collective intelligence and digital teamwork in "crisis mapping", where digital volunteers organize to assess and pinpoint damage in the aftermath of humanitarian crises. By simulating a crisis mapping scenario to study self-organization in teams of varying size, and find a tradeoff between individual effort in small groups and collective coordination in larger teams. We also conduct a study of cooperation in a social dilemma over a month of real time, using crowdsourcing participants to overcome the time constraints of behavioral labs. Our study of about 100 participants over 20 consecutive weekdays finds that a group of resilient altruists sustain a high level of cooperation across the entire population. Together, our work motivates the potential of controlled, highly instrumented studies of social interaction; the importance of behavioral experiments on longer timescales; and how open-source software both can speed up the iteration and improve the reproducibility of experimental work.
* based on joint work with Lili Dworkin, Winter Mason, Siddharth Suri, and Duncan Watts.
Andrew Mao is currently a postdoctoral researcher in Computational Social Science at Microsoft Research in NYC. His research focuses on studying collective intelligence and social interaction on the Internet, such as teamwork in online communities and coordination in crowdsourcing systems. Andrew specializes in designing and gathering data from real-time, interactive, web-based behavioral experiments, and he is the designer of TurkServer (http://turkserver.readthedocs.io/), an open-source platform for building such experiments. His work has appeared in journals including Nature Communications and PLoS ONE as well as computer science conferences such as AAAI, EC, and HCOMP. He received his PhD from Harvard University in 2015.
Speaker: Andrew Mao, Microsoft Research NYC
Technology has advanced to a point where a large part of the population carries a mini-computer in their pockets that is disguised as a phone. Their role has changed from simple communication devices to multi-functional information devices. They are packed with various sensors, such as GPS, gyroscopes, and accelerometers, which can collect contextual data of the device and the user. Thanks to a multitude of applications they satisfy a large spectrum of different needs, and stay with their users most of their time. Smartphone usage is a hot topic in ubiquitous and pervasive computing due to their popularity and personal aspect.
The Menthal team has developed the Menthal framework (https://menthal.org) for collecting and analyzing mobile users' data. It is part of one of the largest in-the-wild smartphone studies. They attracted a large number of participants by running the study in a start-up format: building a user desirable product and then promoting it through media outlets. From the launch of the project, in January 2014, their app has been installed more than 400,000 times and their project attracted more than 350,000 registered participants. From these, they have collected general phone measurements, such as time spent on the phone using apps or communicating, but also other interesting data such as mood/affect measurements and Big Five personality traits. The collected data allows to study a number of problems in HCI, psychological sciences and medicine. In this talk they will present their framework from a technical point of view and afterwards they will discuss past and current results from their research project.
Speaker: Ionut Andone from University of Bonn
The way we express ourselves is heavily influenced by our demographic background. I.e., we don't expect teenagers to talk the same way as retirees. Natural Language Processing (NLP) models, however, are based on a small demographic sample and approach all language as uniform. As a result, NLP models perform worse on language from demographic groups that differ from the training data, i.e., they encode a demographic bias. This bias harms performance and can disadvantage entire user groups.
Sociolinguistics has long investigated the interplay of demographic factors and language use, and it seems likely that the same factors are also present in the data we use to train NLP systems.
In this talk, I will show how we can combine statistical NLP methods and sociolinguistic theories to the benefit of both fields. I present ongoing research into large-scale statistical analysis of demographic language variation to detect factors that influence the performance (and fairness) of NLP systems, and how we can incorporate demographic information into statistical models to address both problems.
Speaker: Dirk Hovy, Computer Science department (DIKU), University of Copenhagen
With increase in usage of the Internet, there has been an exponential increase in the use of online social media on the Internet. Websites like Facebook, Google+, YouTube, Orkut, Twitter and Flickr have changed the way the Internet is being used. There is a dire need to investigate, measure, and understand privacy and security on online social media from various perspectives (computational, cultural, psychological). Real world scalable systems need to be built to detect and defend security and privacy issues on online social media. I will describe briefly some cool projects that we work on: TweetCred, OSM & Policing, OCEAN, and Call Me MayBe. Many of our research work is made available for public use through tools or online services. Our work derives techniques from Computational Social Science, Data Science, Statistics, Network Science, and Human Computer Interaction. In particular, in this talk, I will focus on the following: (1) TweetCred, a tool to extract intelligence from Twitter which can be useful to security analysts. TweetCred is backed by award-winning research publications in international and national venues. (2) How police in India are using online social media, how we can use computer science understanding to help police engage more with citizens and increase the safety in society. (3) OCEAN: Open source Collation of eGovernment data and Networks, how publicly available information on Government services can be used to profile citizens in India. This work obtained the Best Poster Award at Security and Privacy Symposium at IIT Kanpur, 2013 and it has gained a lot of traction in Indian media. (4) Given an identity in one online social media, the author interested in finding the digital foot print of the user in other social media services, this is also called digital identity stitching problem. This work is also backed by award-winning research publication.
Speaker: Ponnurangam Kumaraguru, Indraprastha Institute of Information Technology (IIIT), Delhi, India
Research into socio-technical systems like Wikipedia has overlooked important structural patterns in the coordination of distributed work. This paper argues for a conceptual reorientation towards sequences as a fundamental unit of analysis for understanding work routines in online knowledge collaboration. I outline a research agenda for computational social science researchers to understand the relationships, patterns, antecedents, and consequences of sequential behavior extending methods already developed in fields like sociology and bio-informatics. Using a data set of 37,515 revisions from 16,616 unique editors to 96 Wikipedia articles as a case study, we analyze the prevalence and significance of different sequences of editing patterns. We illustrate the mixed method potential of sequence approaches by interpreting the frequent patterns as general classes of behavioral motifs. We conclude by discussing the methodological opportunities for using sequence analysis for expanding existing approaches to analyzing and theorizing about co-production routines in online knowledge collaboration.
Speaker: Brian Keegan, Harvard Business School
Determining the relative centrality of actors, or the degree to which they are structurally important, is a most common technique in social network analysis. Many indices have been proposed to measure a variety of centrality conceptions, and choosing one that is most appropriate for the particular research question and data at hand proves a challenge in many empirical studies. We use a general result about all common centrality indices to motivate a re-conceptualization of centrality. Our new approach is the first instantiation of a recently introduced positional framework for network analysis. By breaking down complex analytical into comprehensible steps, multivariate data and theoretical assumptions can be integrated more flexibly. Several examples serve to illustrate this point.
Speaker: Ulrik Brandes, University of Konstanz
Characterising how we explore abstract spaces is key to understand our (ir)rational behaviour and decision making. While some light has been shed on the navigation of semantic networks, however, little is known about the mental exploration of metric spaces, such as the one dimensional line of numbers, prices, etc. Here we address this issue by investigating the behaviour of users exploring the “bid space” in online auctions. We find that they systematically perform Lévy flights, i.e., random walks whose step lengths follow a power-law distribution. Interestingly, this is the best strategy that can be adopted by a random searcher looking for a target in an unknown environment, and has been observed in the foraging patterns of many species. In the case of online auctions, we measure the power-law scaling over several decades, providing the neatest observation of Lévy ﬂights reported so far. We also show that the histogram describing single individual exponents is well peaked, pointing out the existence of an almost universal behaviour. Furthermore, a simple model reveals that the observed exponents are nearly optimal, and represent a Nash equilibrium. We rationalise these ﬁndings through a simple evolutionary process, showing that the observed behaviour is robust against invasion of alternative strategies. Our results show that humans share with other animals universal patterns in general searching processes, and raise fundamental issues in cognitive, behavioural and evolutionary sciences.
Speaker: Andrea Baronchelli, City University London
Human mobility has been a hot topic of interest for researchers due to its importance for many application scenarios that include nearby place search, mobile context awareness or mobile advertising. Despite the bevy of research on human mobility patterns analysis and prediction modeling of individual users, however, little attention has been put on the mobility patterns of user collectives across places in a city. In this talk, we will exploit network analysis techniques to view human movement in urban environments from the perspective of an aggregate networked system where nodes are Foursquare venues. We will discuss the geometric properties of place networks in a large number of metropolitan areas around the world and how those compare to other well studied types of networks, such as on-line social networks or the web.
Next, we will shed light on the growth patterns of place networks in terms of node and edge generation processes. Motivated by the fact that a large number of new links is emerging over time in those networks, we will define a link prediction task in this novel application domain with the aim to predict future interactions between Foursquare venues. The talk will close by providing a head to head comparison over the prediction task amongst the well-known, in human mobility literature, gravity models, network-based techniques as well as supervised learning algorithms.
Speaker: Anastasios Noulas, University of Cambridge
In the era of big data and social media analysis, as a way forward, I propose an alternative to vanity metrics or the quantification of trend and personal influence. Rather, for the study of Twitter, Facebook and other secondary social media, I would like to put forward a critical data analytics that is sensitive to big data critique on the one hand and embraces analytical strategies with digital methods based on expertise and engagement on the other hand, making findings and outputting visualisations which are both insightful for (ethical) social research and aware of the hegemony of the graph.
Richard Rogers is Department Chair of Media Studies and Professor of New Media and Digital Culture at the University of Amsterdam. He is author most recently of Digital Methods (MIT Press, 2013), winner of the ICA outstanding book award, and Issue Mapping for an Ageing Europe (Amsterdam University Press, 2015), with Natalia Sanchez and Aleksandra Kil. He is Director of the Digital Methods Initiative and the Govcom.org Foundation, known for online mapping tools such as the Issue Crawler and the Lippmannian Device. He has received research grants from the Ford Foundation, Gates Foundation, MacArthur Foundation, Open Society Institute and Soros Foundation, and has worked with such NGOs as Greenpeace International, Human Rights Watch, Association for Progressive Communications, Women on Waves, Carbon Trade Watch and Corporate Observatory Europe.
Speaker: Richard Rogers, Digital Methods Initiative, University of Amsterdam
Traditionally, most of football statistical and media coverage has been focused almost exclusively on goals and (ocassionally) shots. However, most of the duration of a football game is spent away from the boxes, passing the ball around. The way teams pass the ball around is the most characteristic measurement of what a team’s “unique style” is. In this talk we will showcase how the study of a passing network keeps track of the team’s playing style, and how network invariants such as PageRank provide an adequate measurement for players involvement. Next, we will proceed further into the analysis of passing sequences, what are their likely outcomes, and how the passing patters allow us to construct a “digital fingerprint” of a player’s style.
Speaker: Javier López Peña, University College London
The convergence of social and technical systems provides us with a wealth of data on the structure and dynamics of social organizations. It is tempting to utilize these data in order to better understand how social organizations evolve, how their structure is related to their "success", and how the position of individuals in the emerging social fabric affects their performance and motivation. Taking a complex network perspective on these questions, in this talk I will introduce recent research results obtained in the context of collaborative software engineering. These results demonstrate the potential of network-based data mining methods in the study of social organizations. At the same time, I will highlight fallacies arising in the application of the complex networks perspective to social systems.
Speaker: Ingo Scholtes, ETH Zürich
It has become popular to tap into the "intelligence of the crowd" on the Internet. This talk argues that more often than not, the crowd flips from intelligence to madness, showing more characteristics of football hooligans than complex problem solving behavior. This is in contrast to what I call "creative swarms", where small teams of intrinsically motivated people work together in Collaborative Innovation Networks (COINs) to invent something radically new. The key difference is in motivation: crowds are motivated by money, power and glory, while swarms are intrinsically motivated by the problems they are trying to solve.
The talk introduces a collaboration scorecard made up of six key variables – “honest signals” – indicative of creative swarms. The variables are computed by analyzing global communication on the Web, in Twitter, and Wikipedia, in organizations through e-mail, and in small teams through sociometric badges.
The talk is illustrated by many examples, with emphasis on high-tech firms and healthcare. For instance, it illustrates how customer satisfaction and employee attrition is predicted in a large Indian outsourcing company by analyzing the company’s e-mail archive. It also introduces the Chronic Collaborative Care Network (C3N) at Cincinnati Children's Hospital, where COINs of medical researchers, physicians, patients and their families are working together to improve the lives of patients with Crohn's disease, diabetes, and cystic fibrosis. Analyzing the e-mail archive of the C3N innovation teams and providing them with a process called “virtual mirroring” where the communication behavior of creative teams is mirrored back, helps them to increase creativity by improved communication.
Speaker: Peter A. Gloor, MIT's Sloan School for Management
Query-specific Wikipedia Construction
We all turn towards Wikipedia with questions we want to know more about, but eventually find ourselves on the limit of its coverage. Instead of providing "ten blue links" like common in Web search, my goal is to answer any web query with something that looks and feels like Wikipedia. I am developing algorithms to automatically retrieve, extract, and compile a knowledge resource for a given web query. I will talk about a supervised retrieval model that can jointly identify relevant Web documents, Wikipedia entities, and extract support passages [1,2]. (For a web demo  with some example queries see: http://smart-cactus.org/~dietz/knowport/)
Network Topic Models
Topic models such as Latent Dirichlet Allocation are an unsupervised technique to extract word clusters with topical character from a given corpus of text documents. Often we find text documents with an underlying link structure, or a network in which nodes are associated with text content. It is often assumed that connected nodes have some shared trait or interest which motivated the forming of the connection. In this talk, I will discuss several topic model extensions for textual network data. This includes the Citation Influence Model  which quantifies the strengths of a citation strength in an acyclic graph through a topic model. Furthermore, I will discuss the Shared Taste Model  which learns topics that capture shared interests in an undirected social network. As communication between users is often off-limits due to privacy concerns, the model learns from public text written by users, such as tweets, tags, posts, etc. The goal is to predict which friend of the user is interested in the content. The source code for both models is available on Github .
Speaker: Laura Dietz, Center for Intelligent Information Retrieval (CIIR) at University of Massachusetts
8. September 2015, Robert West, Improving Website Hyperlink Structure Using Server Logs / Media Coverage of Death
Recent work: Improving Website Hyperlink Structure Using Server Logs
Good websites should be easy to navigate via hyperlinks, yet maintaining a link structure of high quality is difficult. Identifying pairs of pages that should be linked may be hard for human editors, especially if the site is large and changes are frequent. To support human editors, we develop an approach for automatically finding useful hyperlinks to add to a website. We show that passively collected server logs, beyond telling us which existing links are useful, also contain implicit signals indicating which nonexistent links would be useful if they were to be introduced. We leverage these signals to model the future usefulness of as yet nonexistent links. Based on our model, we define the problem of link placement under budget constraints and propose an efficient algorithm for solving it. We demonstrate the effectiveness of our approach by evaluating it on Wikipedia and Simtk.org. (Joint work with Ashwin Paranjape and Jure Leskovec of Stanford, and Leila Zia of Wikimedia)
Ongoing work: Media Coverage of Death
Death is an inevitable fact of the human condition and as such draws much attention. The deaths of famous people tend to be widely covered by the media in the form of obituaries and news articles, and may lead to a sustained change in the way their lives are collectively remembered. In this work in progress, we ask the question how deceased famous people are remembered by the media. To shed light on this question, we identify a set of notable people deceased during the six years from 2008 to 2014 and track them in a large corpus of news articles and blog posts spanning the entire six-year period. Our results show that death generally has a profound impact on how people are perceived by the media. Further, we find that posthumous media coverage varies with the circumstances of death and the biographic background of the deceased. (Joint work with Jure Leskovec and Christopher Potts of Stanford)
Speaker: Robert West, InfoLab at Stanford University
Abstract: At the beginning of 2014, as an answer to the growing concerns about the role played by data mining/machine learning algorithms in decision-making, USA President Obama called for a 90-day review of big data collecting and analysing practices. The resulting report concluded that “big data technologies can cause societal harms beyond damages to privacy”. In particular, it expressed concerns about the possibility that decisions informed by big data could have discriminatory effects, even in the absence of discriminatory intent, further imposing less favorable treatment to already disadvantaged groups. In its recommendations to the President, the report called for additional "technical expertise to stop discrimination", and for further research into the dangers of "encoding discrimination in automated decisions".
In parallel to development in anti-discrimination legislation, efforts at fighting discrimination have led to developing anti-discrimination techniques in data mining. Some proposals are oriented to the discovery and measurement of discrimination, while others deal with preventing data mining from becoming itself a source of discrimination, due to automated decision making based on discriminatory models extracted from inherently biased datasets. In this talk, I will introduce some of the recent techniques for discrimination prevention, simultaneous discrimination and privacy protection, and discrimination discovery and show some recent results.
Speaker: Sara Hajian, Eurecat-Technology center of Catalonia
Abstract: Due to the low acquisition cost and its sheer scale, social media is becoming a popular data source for studies on tracking health trends at scale. These studies usually take a population-centric approach where their “ground truth” used for validation and model fitting is derived either from time series data, e.g. from temporal influenza activity, or from geo-graphically varied data, e.g. from county-level obesity rates in the US. In this talk, I will present recent and ongoing work that uses social media data to study lifestyle diseases such as obesity. The first line of work takes the population-centric approach and uses food mentions on Twitter to study obesity. I will then move on towards individual-centric health studies of obesity and dieting. Such a fine-grained level analysis is made possible by (i) labeling individual users as “is overweight or not” using their profile pictures, and (ii) analyzing users whose internet-enabled smart scales tweet their weight. I’ll conclude by outlining a vision of how social media data, physical sensor data, and electronic health records could be combined in a clinical setting to provide a more holistic view on a patient’s health.
Speaker: Ingmar Weber, Qatar Computing Research Institute
Abstract: Prominent data scientists have declared “the end of theory” in the era of big data. I argue that it is rather the beginning, due to new opportunities for a relational theoretical framework. From astronomy to neuroscience to particle physics, scientific knowledge depends decisively on the available tools for observation. For the past century (or more), the survey has been the single most important observational tool for social science. During this time, enormous advances have taken place in our ability to reduce systematic bias in sampling hidden populations, to reduce measurement error in the responses to survey items, to reduce statistical error in the analysis of results, and to reduce inferential error in causal models of the associations among the measures. Nevertheless, increasing confidence in survey technology has paradoxically reinforced a debilitating theoretical blinder that has compromised the ability of social science to elicit confidence in predictions. What is worse, this blinder has largely escaped notice through a combination of ideological bias and reluctance to pull back the covers on problems for which we have no solution. The good news is that a solution is finally on the horizon.
Speaker: Michael Macy, Goldwin Smith Professor of Arts and Sciences and Director of the Social Dynamics Laboratory at Cornell University
Abstract: The personal stories that people post to their public weblogs offer a glimpse into the everyday lives of people. In this talk I will discuss our efforts to automatically gather tens of millions of these stories, and use them as a dataset for investigating different populations of authors. I will discuss our work on analyzing the stories that people tell about health emergencies (strokes), and how this led us to concerns about sample bias. I will describe our ongoing work on bias correction for social-media samples, and discuss opportunities afforded by populations of extremely prolific webloggers whose demographic information can be readily extracted from the stories they share.
Speaker: Andrew S. Gordon, Institute for Creative Technologies, University of Southern California
Abstract: With the widespread adoption of social media sites like Twitter and Facebook, there has been a shift in the way information is produced and consumed in our societies. Traditionally, information was produced by large news organizations, which broadcast the same carefully-edited information to all consumers over mass media channels. In contrast, in online social media, any user can be a producer of information, and every user selects which other users she connects to, thereby choosing the information she consumes. Furthermore, recommender systems deployed on most social media sites provide users with additional information that is tailored to their individual tastes.
In this talk, I will introduce the concept of information diet – which is the topical or distribution of a given set of information items (e.g., tweets) – to characterize the information produced and consumed by various types of users in the popular Twitter social media. At a high level, we find that (i) popular users mostly produce very specialized diets focusing on only a few topics; in fact, news organizations (e.g., NYTimes) produce much more focused diets on social media as compared to their mass media diets, (ii) most users’ consumption diets are primarily focused towards one or two topics of their interest, and (iii) the personalized recommendations provided by Twitter help to mitigate some of the topical imbalances in the users’ consumption diets, by adding information on diverse topics apart from the users’ primary topics of interest.
Speaker: Krishna Gummadi, Max Planck Institute for Software Systems Saarbrücken
Abstract: An der Leibniz Universität Hannover wurde eine interdisziplinäre Studie zur Repräsentation von Street Art in Flickr durchgeführt. An der Untersuchung waren ein Soziologe, zwei Informatiker und mehrere Hilfskräfte beteiligt. Ausgangspunkt des Forschungsvorhabens war die These von Ulf Wuggenig, dass sich mit dem Internet auch die Art und Weise der Rezeption und Wahrnehmung von Street Art verändere. Anstatt etablierter Kunstinstitutionen würde vor allem das Internet dazu beitragen, die Repräsentation und Anerkennungsprozesse von Street Artists zu befördern. Dazu wurde eine visuelle Inhaltsanalyse durchgeführt. Das Forschungsprojekt kam zu dem Ergebnis, dass in Flickr eine Street Art-"orientierte" Community existiert und für die Repräsentation von Street Art sorgt. Ich werde daher in meinem Vortrag auf die Möglichkeiten eingehen, anhand verfügbarer Metadaten die Community zu beschreiben und wie Street Art in Flickr repräsentiert wird. In diesem Zusammenhang gehe ich auch auf weiterführende Fragestellungen ein, die sich für mich aus der Forschung ergeben haben.
Speaker: Axel Philipps, Leibniz University of Hannover
Abstract: Wikipedia is a huge global repository of human knowledge, and at the same time one of the largest experiments of online collaboration. Its articles, their links and the negotiations around their content tend to reflect societal debates in different language communities. The first work I will present is Contropedia, a platform that adds a layer of transparency to Wikipedia articles. Combining activity from the edit history and discussion in talk pages, the platform uses wiki links as focal points to explore the development of controversial issues over time. The second study focuses on the network of hyperlinks in different language editions of Wikipedia. A ranking of the most central biographies in each language edition is used to study relationships and influences between cultures. Finally, I will present a large-scale analysis of emotional expression and communication style of editors in Wikipedia discussions, focusing on how emotion and dialogue differ depending on the editors' status, gender, and communication network.
Speaker: David Laniado from Barcelona Media
Abstract: Modeling social media and blogging is a fashionable topic of research. The challenges include how to model, exploit, store, and analyze social content data. In this talk we will discuss about novel approaches on how to exploit such data in order to build: (1) new types of recommender systems; (2) discover and understand the evolution of topics over time.
Speaker: Puya Hossein Vahabi, Yahoo Labs Barcelona
Since 1970s urban theories proposed by Lynch and Milgram aimed at understanding complex city dynamics. Can these theories be put to use for enabling new mobile services? The answer is a definitive “Yes!”. Existing mapping technologies return shortest directions. To complement them, we are designing new mobile phone tools that return directions that not only are short but also make the experience of pedestrians happier. To capture a fuzzy concept such as happiness, we have combined Flickr metadata with the crowdsourcing site urbangems.org, which I co-designed with colleagues at the University of Cambridge. This crowdsources visual perceptions of quiet, beauty and happiness across the city of London using pictures of street scenes.
Speaker: Daniele Quercia, Yahoo Labs Barcelona
Abstract: The problem of understanding the dynamics of collective attention has been identified as a key scientific challenge for the information age. In this talk, we first show that search behaviors of large populations of Internet users evolve in a highly regular manner and that corresponding time series can be modeled using skewed distributions. We then ask if such dynamics could be explained in terms of infectious processes that take place in social networks and derive a physically plausible, model for the temporal dynamics of graph diffusion processes. Our results are based on maximum entropy arguments and provide new approaches to problems in network analysis and mining.
Speaker: Christian Bauckhage, Fraunhofer IAIS
Abstract: Online participatory media, such as social networking sites and forums, changed Internet users from simple information consumers to active producers of online content, turning our societies into "digital democracies". As part of a Swiss SNF funded project, we explore the dynamics of polarization of opinions and social structures through the digital traces left by politicians and voters in online participatory media.
Our first study focuses on Politnetz, a Swiss platform focused on political activity, composed of support links between politicians, comments, and likes. We analyzed network polarization as the level of intra-party cohesion with respect to inter-party cohesion, finding that support show a very strongly polarized structure with respect to party alignment. We found that comment structures follow topics related to Swiss politics, and that polarization in likes evolves in time, increasing when the federal elections of 2011 were close. Furthermore, we analyzed the internal social structure of each party through social network metrics related to hierarchical structures and information efficiency. This analysis highlights patters of the relation between the connectivity patterns of parties and their political position within a multi-party system. Our second work analyzes the evolution of the 15M movement through its digital traces in the Twitter social network. We analyzed the tweets related to the movement during 30 days around its creation, providing an illustration of the evolution and structure of the movement at its collective and individual level. We found patterns of influence of collective action and mass media in the polarization of opinions about the movement, and found different stages of movement formation and expansion through Twitter activity. Our sentiment and psycholinguistic analysis of the content of tweets reveals that activity cascades with strong negative sentiment and social-related terms spread to larger amounts of users. At the individual level, we found that users that are more embedded in the movement display higher levels of activity and express stronger negativity, in line with the overall negative context of the movement.
Speaker: David Garcia, ETH Zürich
Abstract: Though online social network research has exploded during the past years, not much thought has been given to the exploration of the nature of the social structures that compose them. Online interactions have been interpreted as indicative of one social process or another (e.g., status exchange or trust), often with little systematic justification regarding the relation between observed data and theoretical concept. Our research aims to breach this gap in computational social science by trying to explain the nature and purpose of social structures, with quantitative metrics that are directly derived from longstanding concepts in social sciences. In this talk we will discuss about characterization of social links and social groups. We propose a method based on Blau's notion of resource exchange that discovers, with high accuracy, the fundamental domains of interaction occurring over links in social networks. By applying this method on two online datasets different by scope and type of interaction (aNobii and Flickr) we observe the spontaneous emergence of three domains of interaction representing the exchange of status, knowledge and social support. By finding significant relations between the domains of interaction and classic social network analysis issues (e.g., tie strength, dyadic interaction over time) we show how the network of interactions induced by the extracted domains can be used as a starting point for more nuanced analysis of online social data that may one day incorporate the normative grammar of social interaction. Also, we explore the nature of online groups through the lens of the common identity and common bond theory, defining a set of features to classify groups into those two categories. We show that the classification works with high accuracy on Flickr groups.
Speaker: Luca Maria Aiello, Yahoo Labs Barcelona
16. June 2014, 2pm, Taha Yassari, What we could read from Wikipedia apart from its articles? Conflicts, Power, Fame, and Money.
Abstract: Wikipedia is the largest encyclopaedia in the world and seeks to "create a summary of all human knowledge". An encyclopaedia is supposed to contain a collection of objective facts, reported by secondary sources. However, the crowdsourced nature of Wikipedia makes it a source of information by itself reflecting the interests, preferences, opinions, and priorities of the members of its community of editors. By analysing the editorial conflicts between the editors of different language edition, we can create interesting images of each language community interests and concerns. Moreover, the page view statistics of Wikipedia articles, provide a unique insight to the patterns of information seeking by its readers. In this presentation, we start by Wikipedia edit wars and discuss what we could learn from the warring patterns about our real life facts, and then three examples are shown, in each of them statistics of editorial activities and page views are considered as proxies to assess popularity and visibility of items. Movie market, election, and scientific reputation are the three topics we have investigated and observed under certain conditions, there is a high correlation between popularity and Wikipedia edits and page views volumes. Based on these correlations and in the presence of external data to calibrate a predictive model, one is able to forecast the prospective success of an item in a reasonably accurate way.
Speaker: Taha Yasseri, Oxford Internet Institute
FuturlCT is a global initiative pursuing a participatory approach, integrated across the fields of ICT, the social sciences and complexity science, to design socio-inspired technology and develop a science of global, socially interactive systems. The initiative wants to bring together, on a global level, Big Data, new modelling techniques and new forms of interaction, leading to a new understanding of society and its co-evolution with technology. The goal is to create a major scientific drive to understand, explore and manage our complex, connected world in a more sustainable and resilient manner.
The initiative is motivated by the fact that ubiquitous communication and sensing blur the boundaries between the physical and digital worlds, creating unparalleled opportunities for understanding the socio-economic fabric of our world, and for empowering humanity to make informed, responsible decisions for its future. The intimate, complex and dynamic relationship between global, networked ICT systems and human society directly influences the complexity and manageability of both. This also opens up the possibility to fundamentally change the way ICT will be designed, built and operated, reflecting the need for socially interactive, ethically sensitive, trustworthy, self-organized and reliable systems.
It is planned to build a new public resource - value-oriented tools and models to aggregate, access, query and understand vast amounts of data. Information from open sources, real-time devices and mobile sensors would be integrated with multi-scale models of the behaviour of social, technological, environmental and economic systems, which could be interrogated by policy-makers, business people and citizens alike. Together, these would build an eco-system leading to new business models, scientific paradigm shifts and more rapid and effective ways to create and disseminate new knowledge and social benefits - thereby forming an innovation accelerator.
Speaker: Dirk Helbing, ETH Zürich
24. February 2014, 10am, Cristian Danescu-Niculescu-Mizil, "Language and Social Dynamics in Online Communities"
More and more of life is now manifested online, and many of the digital traces that are left by human activity are in natural-language format. In this talk I will show how exploiting these resources under a computational framework can bring a new understanding of online social dynamics; I will be discussing two of my efforts in this direction.
The first project explores the relation between users and their community, as revealed by patterns of linguistic change. I will show that users follow a determined life-cycle with respect to their susceptibility to adopt new community norms, and how this insight can be harnessed to predict how long a user will stay active in the community.
The second project proposes a computational framework for identifying and characterizing politeness, a central force shaping our communication behavior. I will show how this framework can be used to study the social aspects of politeness, revealing new interactions with social status and community membership.
This talk includes joint work with Dan Jurafsky, Jure Leskovec, Christopher Potts, Moritz Sudhof and Robert West.
Speaker: Cristian Danescu-Niculescu-Mizil
6.January 2014, 10.30am, Emilio Ferrara, "Connecting online and offline worlds with social media big data: Twitter trends and social mobilization"
The increasing availability of data across different socio-technical systems, such as online social media, mobile phone networks, and collaborative knowledge platforms, presents novel challenges and intriguing research opportunities. As more online services permeate through our everyday life and as data from various domains are connected and integrated with each other, the boundary between ‘real-world’ and ‘virtual online world’ becomes blurry. Scholars from different fields have now rich sources of information on individual behaviors at a scale that only a decade ago was hardly conceivable. Such data cover both online and offline activities of people, as well as multiple time scales, prompting a variety of research questions on human behaviors and activities in the real and online worlds. In this talk I will discuss two examples of how online and offline worlds interact and affect each other. In the first case, I'll show how online conversation on Twitter triggers and responds to real worlds events in the context of the Occupy Wall Street social mobilization. In turn, the second example will illustrate how human mobility affects topics of discussion on such online platforms: I'll draw a parallel between information diffusion and epidemics spreading, showing that the dynamics driving the emergence of collective attention and trends are tightly interconnected with individuals mobility in the real world.
Speaker: Emilio Ferrara
11.December 2013, 10.30am, Vera Liao, „Breaking out of the Echo Chamber: Understanding and Designing for Cross-ideology Discussions“
Nowadays Internet users rely heavily on many information filters, in the guise of search engine, recommender system, people they follow on social media, to filter out unfavorable information and seek agreeable opinions. Critics warned that they would isolate people in their own belief, cultural and ideological echo chambers. As a result, we may see increasing social fragmentation and political polarization in our society. This talk will discuss recent Human-computer Interaction (HCI) research in the attempt to mitigate the problem. We conduct user studies and analyze user generated data on the Web to understand how people behave cognitively and socially in online ideologically diverse environment. By leveraging such knowledge and relevant psychological theories, we design interfaces that encourage seeking of diverse opinions and stimulate cross-ideology conversations.
Speaker: Q. Vera Liao
27.November 2013, 3pm, Andreas Jungherr, „Twitter data as information source on political communication and political campaigns: Examples from Germany“
The microblogging service Twitter is increasingly becoming a tool for political communicators and the public in various countries to communicate about politics. This is also true in Germany. Germany offers an interesting case to analyze the impact of Twitter on political communications. Since 2009, Twitter has been a central tool in election campaigns, political activism, the self-marketing of politicians, and the media coverage of campaigns. The talk will address recent research on the use of Twitter during the campaigns for the federal elections 2009 and 2013 and during one of the biggest political protests in Germany’s recent past, the protests against the infrastructure project Stuttgart 21. Questions addressed in this talk will be: Do political Twitter messages predict the results of elections in Germany, which kind of political events do lead to spikes in Twitter messages commenting on politics, and how do politically vocal Twitter users use Twitter during campaigns?
Speaker: Andreas Jungherr
Scholars of media and communication sciences study the role of media in our society. They frequently search through media archives to select items that cover a certain event. When this is done for large time spans and across media-outlets, this task can however be challenging and laborious. The interdisciplinary project PoliMedia aims to stimulate and facilitate large-scale, cross-media analysis of the coverage of political events. We focus on the meetings of the Dutch parliament, and provide automatically generated links between the transcripts of those meetings, newspaper articles, including their original lay-out on the page, and radio bulletins. Via a web application users are able to search through the debates and find related media coverage in various media outlets, facilitating a more efficient search process and analysis of the media coverage. Furthermore, the generated links are available in an online database, allowing quantitative analyses with complex, structured queries.
Speaker: Laura Hollink