Das GESIS Computational Social Science (CSS) Seminar ist eine englischsprachige monatliche Veranstaltung zum Expertenaustausch rund um die Themen Data Science und Social Analytics.
Social networks are complex and dynamic systems. Individual nodes in networks, however, do not necessarily overlook the network as a whole, but are mostly affected by their smaller (micro-level) neighborhoods. At the same time, emerging large-scale (macro-level) network outcomes such as segregation, cluster formation, or the distribution of knowledge have a direct impact on them and can restrict their opportunities to act. In the study of social network dynamics it is thus important to simultaneously consider two levels: the macro-level of large-scale network structures and the micro-level of individuals’ preferences, opportunities and actions. This talk illustrates how state-of-the-art statistical network methods and computational techniques can be combined to investigate the micro-macro link in social networks. Recent empirical work in the context of the Swiss StudentLife study will illustrate the value of this approach.
Christoph Stadtfeld is an assistant professor of Social Networks at ETH Zürich. His research focuses on the development and application of theories and statistical methods for social network dynamics. He holds a PhD from Karlsruhe Institute of Technology and has been postdoctoral researcher and Marie-Curie fellow at the University of Groningen, the Social Network Analysis Research Center in Lugano, and the MIT Media Lab. His work is published in leading sociological and interdisciplinary journals including Social Networks, Social Forces, Sociological Science, Sociological Methodology, and PNAS.
Speaker: Christoph Stadtfeld, ETH Zürich
With the advent of large-scale data and the concurrent development of robust scientific tools to analyze them, important discoveries are being made in a wider range of scientific disciplines than ever before. A field of research that has gained substantial attention recently is the analytical, large-scale study of human behavior, where many analytical and statistical techniques are applied to various behavioral data from online social media, markets, and mobile communication, enabling meaningful strides in understanding the complex patterns of humans and their social actions.
The importance of such research originates from the social nature of humans, an essential human nature that clearly needs to be understood to ultimately understand ourselves. Another essential human nature is that they are creative beings, continually expressing inspirations or emotions in various physical forms such as a picture, sound, or writing. As we are successfully probing the social behaviours humans through science and novel data, it is natural and potentially enlightening to pursue an understanding of the creative nature of humans in an analogous way. Further, what makes such research even more potentially beneficial is that human creativity has always been in an interplay of mutual influence with the scientific and technological advances, being supplied with new tools and media for creation, and in return providing valuable scientific insights. In this talk, the speaker will present recent works on the mathematical analysis of color contrast in painting, and construction of content-based influence networks in culture.
Juyong Park is an Associate Professor of Culture Technology at KAIST - Korea Advanced Institute of Science & Technology. The speaker holds a Ph.D. in Physics and Complex Systems from the University of Michigan and was a Research Fellow at Northeastern University and Harvard Medical School. His research interests include culture and cultural phenomena from a complex-systems perspective.
Speaker: Juyong Park, KAIST
Many social, technological or biological systems are formed by a complex pattern of connections between their constituents; for example, billions of users interact through online networks such as Facebook and Instagram, billions of electronic machines interact via physical connections in the Internet, and thousands of billions of synaptic connections comprise the neural network of our brain. The network structure of such systems is known to have a huge impact on how they operate. Here, Ali Faqeeh reviews some of his recent findings about the effect of structure on networked behavior and discuss potential applications in social sciences. The speaker presents results regarding different aspects of structural properties, including, multilayer networks, network hidden geometry, community structures, and noisy (imperfect) data on structure and/or dynamics. Ali Faqeeh discusses how each of these properties play a crucial role in various applications such as determining robustness of networks, optimal vaccination strategies, efficient navigation, and identification of the most influential spreaders.
Ali Faqeeh is a postdoctoral fellow at the Mathematics Application Consortium for Science and Industry (MACSI), University of Limerick, Ireland, and a non-resident research fellow of the Center for Complex Networks and Systems Research (CNetS), Indiana University, Bloomington, IN, USA. He is part of the project “Mathematical modeling of social spreading phenomena” and is currently working on the dissemination of scientific research through academic publications, the spread of online information in Twitter, and the identification of the most influential spreaders in online platforms. From 2016 to 2018 Ali Faqeeh was a postdoctoral researcher at CNetS and at School of Informatics, Computing, and Engineering, Indiana University, USA. He holds a Ph.D. in Applied Mathematics from the University of Limerick, Ireland (2016), an M.Sc. in Condensed Matter Physics (2012) and a B.Sc. in Physics (2009) from Isfahan University of Technology, Iran. His research interests include computational social science, complex networks and systems, and modeling of stochastic processes.
Speaker: Ali Faqeeh, University of Limerick, Ireland and Indiana University, Bloomigton
Search engines are seen by their users as trustworthy and neutral intermediaries between users and the content of the web. This is not true, however, as can be seen from the self-interest of search engine operators, which has led, among other things, to an antitrust case by the European Commission against Google. On the other hand, content providers and the search engine optimizers they commission have considerable opportunities to influence the search results of Google and other search engines in their favour.
This raises the question of what results or what kind of results users get to see in the top positions of search engines. Dirk Lewandowski and his colleagues seek answers to this question by automatically evaluating the top results for a large number of search queries on the same topic. They extract the search queries from search engine log files so that they can realistically map the query behavior of users. The analysis of the search results takes place both on the level of the domain and on the level of the providers behind them (by automatically collecting the imprint data of the websites).
In addition to software development, Dirk Lewandowski and his colleagues analysed search queries on the subject of insurance comparisons as a first use case. Among other things, it became apparent that Google's top search results, from which the majority of the hits are selected by the users, are provided by only a few companies and that these companies can thus exert a strong influence on the perception of a topic. Other topics that they will work on include gender stereotypes in the search results, controversial topics such as nuclear power or economic topics such as financing.
Dirk Lewandowski is a professor of information research and information retrieval at the Hamburg University of Applied Sciences, Germany. He is the editor of Aslib Journal of Information Management (formerly: Aslib Proceedings), a ISI-ranked information science journal. Dirk Lewandowski studied library science at the School of Library Science in Stuttgart, as well as philosophy, information science, and media studies at Heinrich Heine University in Düsseldorf. He received his Ph.D. from that university in 2005.
Dirk has published extensively in the areas of Web information retrieval, search engine user behaviour and the role that search engines play in society. His work has been published in some of the leading information science journals, including JASIST, Journal of Information Science and Journal of Documentation. Dirk has served as an expert to, among others, the High Court of Justice (UK) and the Deutscher Bundestag (German Parliament). He has been named an ACM Distinguished Speaker in 2016.
Prof. Lewandowski authored and edited several books on search engines, including “Suchmaschinen verstehen” (Springer, 2015) and “Web Search Engine Research” (Emerald Group Publishing, 2012), as well as a series of German-language handbooks on search.
Speaker:Dirk Lewandowski, Hamburg University of Applied Sciences
The last decades of psychological research have been highly dominated by the use of self-report questionnaires as the primary method of data collection. This method has been useful to study inner process such as feeling, thoughts and emotions as well as behavior. Questionnaire data is also known to be subject to a series of biases such as response styles as e.g. social desirability, ecological invalidity and memory. However, due to the lack of alternatives, the field of Psychology has embraced questionnaire data ever since. Currently, the digitalization of our society rapidly progresses from the use of smartphones and wearables to the habitation of fully digital smart homes and environments. Whereas some years ago, phones merely represented simple communication devices, the application-ecosystems of modern phones are able to satisfy a wide range of daily human needs such as surfing the web, banking, listening to music, and dating, to name a few. Furthermore, smartphones are equipped with a large number of sensors and computational capabilities. As a natural byproduct of user interactions, smartphones produce large amounts of data about where, when, and how people do what with their phones. With regard to the aforementioned lack of “real data” in Psychology, usage of these data could allow for the systematic investigation of individual differences both, across individuals (traits: big-five personality, demographics) and as processes within single individuals, over time (states: emotions, mood). In this talk Clemens Stachl will present the current state of the PhoneStudy mobile sensing project at the Ludwig-Maximilians-Universität München and first insights from the collected data. The aim of the PhoneStudy project is the development of a tool for both, the collection and the analyses of actual behavioral and situational data in Psychology.
Clemens Stachl is a PostDoc researcher at chair for psychological methods and assessment at Ludwig-Maximilians-Universität München. In his research, he focuses on the collection and of behavioral data with means of consumer electronics (e.g. smartphones, cars etc.). Currently, he is investigating the possibility to predict psychological traits (e.g. personality) from digital traces of behavior, collected with the PhoneStudy smartphone app.
Speaker:Clemens Stachl, Ludwig-Maximilians-Universität München
Twitter research to date has focused mainly on the study of isolated events, as described for example by specific hashtags or keywords relating to elections, natural disasters, public events, and other moments of heightened activity in the network. This limited focus is determined in part by the limitations placed on large-scale access to Twitter data by Twitter, Inc. itself. This research presents the first ever comprehensive study of a national Twittersphere as an entity in its own right. It examines the structure of the follower network amongst some 4 million Australian Twitter accounts and the dynamics of their day-to-day activities, and explores the Australian Twittersphere's engagement with specific recent events.
Dr. Axel Bruns is a Professor in the Digital Media Research Centre at Queensland University of Technology in Brisbane, Australia, and was a Chief Investigator in the ARC Centre of Excellence for Creative Industries and Innovation (CCi). He is the President of the Association of Internet Researchers. Bruns is the author of Blogs, Wikipedia, Second Life and Beyond: From Production to Produsage (2008) and Gatewatching: Collaborative Online News Production (2005), and a co-editor of of Twitter and Society (2014), A Companion to New Media Dynamics (2012) and Uses of Blogs (2006). Bruns is an expert on the impact of user-led content creation, or produsage, and his current work focusses on the study of user participation in social media spaces such as Twitter, especially in the context of acute events.
Bruns’s main research interests are in social media, ‘big data’ research methods, produsage, citizen journalism, and online communities.
Speaker: Axel Bruns, Queensland University of Technology
In this talk Bruno Ribeiro generalizes traditional node/link prediction tasks in temporal attributed networks, to consider joint predictions over larger $k$-node induced subgraphs. Ribeiro shows why traditional network models fail at this task and introduce a potential solution to the problem. His key insight is incorporating the unavoidable data dependencies in training into both the input features and the model architecture itself via high-order dependencies and subgraph embeddings. The strength of the representation is its invariance to isomorphisms and varying local neighborhood sizes, while still being able to take node/edge labels into account in an inductive model which can be applied to unseen data. Learning also requires new sampling methods, where he will introduce the concept of Markov Chain Las Vegas for optimization as a more principled and flexible alternative to Contrastive Divergence.
Bruno Ribeiro is an Assistant Professor at the Department of Computer Science at Purdue University. He obtained his Ph.D. at University of Massachusetts Amherst and did his postdoctoral studies at Carnegie Mellon University. His research interests are in machine learning, with a focus on sampling and modeling relational and temporal data.
Speaker: Bruno Ribeiro, Purdue University
In 1998, Sir Tim Berners-Lee famously pleaded that “Cool URIs don’t change”: Content on the web should always remain accessible through one and exactly one address. Almost twenty years later, nothing could be further from the truth. Instead, we are rapidly moving to a world where everything online is different for everybody, all the time. Personalized sites are tailored to user preferences, posts and comments are edited, hidden or deleted as time goes by, and every bit of information has an abundance of copies, variants and remixes.
The resulting challenges to empirical methods are huge: It has become virtually impossible to gather representative samples of online data and, even worse, even those samples often don’t reflect what users see! Taking stock of current approaches to digital methods, I will argue that many widely-used practices will soon be obsoleted by new technologies, changing user behavior and declining access to APIs. However, by embracing the disjunct nature of the new web, we can expand our set of methods to secure scientific access, bolster precision and develop new theoretical avenues.
Pascal Jürgens is Research Associate at the Department of Communication at the Johannes Gutenberg-University of Mainz. His research focuses on the diffusion of information; fragmentation of and through information behavior; political communication, participation and protest culture online (e.g., petitions, protest movements, ad hoc incidents); computational quantitative methods (computer-based content analysis, time series, etc.); and social networks analysis (Twitter, Facebook).
Speaker: Pascal Jürgens, Johannes Gutenberg-University of Mainz
Fairness in machine learning is an important and popular topic these days. Most papers in this area frame the problem as estimating a risk score. For example, Jack’s risk of defaulting on a loan is 8, while Jill's is 2. These algorithms are supposed to produce decisions that are probabilistically independent of sensitive features (such as gender and race) or their proxies (such as zip codes). Some examples here include precision parity, true positive parity, and false positive parity between groups in the population. In a recent paper, Kleinberg, Mullainathan, and Raghavan (arXiv:1609.05807v2, 2016) presented an impossibility result on simultaneously satisfying three desirable fairness properties when estimating risk scores with differing base rates in the population. Tina Eliassi-Rad takes a boarder notion of fairness and asks the following two questions: Is there such a thing as just machine learning? If so, is just machine learning possible in our unjust world? The speaker will describe a different way of framing the problem and will present some preliminary results.
Tina Eliassi-Rad is an Associate Professor of Computer Science at Northeastern University in Boston, MA. She is also on the faculty of Northeastern's Network Science Institute. Prior to joining Northeastern, Tina was an Associate Professor of Computer Science at Rutgers University; and before that she was a Member of Technical Staff and Principal Investigator at Lawrence Livermore National Laboratory. Tina earned her Ph.D. in Computer Sciences (with a minor in Mathematical Statistics) at the University of Wisconsin-Madison. Her research is rooted in data mining and machine learning; and spans theory, algorithms, and applications of massive data from networked representations of physical and social phenomena. Tina's work has been applied to personalized search on the World-Wide Web, statistical indices of large-scale scientific simulation data, fraud detection, mobile ad targeting, and cyber situational awareness. Her algorithms have been incorporated into systems used by the government and industry (e.g., IBM System G Graph Analytics) as well as open-source software (e.g., Stanford Network Analysis Project). In 2010, she received an Outstanding Mentor Award from the Office of Science at the US Department of Energy.
Speaker: Tina Eliassi-Rad, Northeastern University
The availability of real-world data provides partial and indirect observations of real-world phenomena, allowing studying and understanding these phenomena. Unsupervised learning methods, such as clustering and latent feature models, are suitable techniques to ease the data analysis process allowing us to both analyze and make predictions on the data. However, most of the existing techniques assume the data to be homogeneous and i.i.d. This assumption might be too limiting in many real-world application domains, such as computational social science, where we often aim to analyze user data which contain not only social-demographic information of the users (i.e., heterogeneous data) but also users’ activity data (i.e, time-dependent data).
In this talk, Isabel Valera will focus on providing the key ideas behind her approach to perform unsupervised learning in both heterogeneous datasets, containing mixed continuous and discrete observations; and with continuous-time data, abundant in an increasingly networked digital world. The speaker will then use the proposed approached to analyze and perform predictions in data collected from different application domains, including social networks.
Isabel Valera is a Minerva research group leader at the Max Planck Institute for Intelligent Systems. Isabel develops flexible and efficient probabilistic models and inference algorithms to fit and analyze real-world data. She is particularly interested in problems related to the unstructured and complex nature of real-world data, which are often time-dependent, heterogeneous, noisy, and might contain errors and missing values. Isabel obtained her PhD in 2014 and her MSc degree in 2012, both from the University Carlos III in Madrid, Spain. She has been a German Humboldt Post-Doctoral Fellowship Holder, and recently she has been granted with a Minerva fast track research group from the Max Planck Society.
Speaker: Isabel Valera, Max Planck Institute for Intelligent Systems
Over the last ten years, researchers have found themselves confronting a massive increase in available data sources. In the debates on how to use these new data, the research potential of “digital trace data” has featured prominently. While various commentators expect digital trace data to create a “measurement revolution”, empirical work has fallen somewhat short of these grand expectations. In this talk, Andreas Jungherr will attempt to trace the reasons for this. For one, the traditional fields in the social sciences (perhaps with exception of communication science) have shown a disappointing disinterest in phenomena connected with the impact of the digital revolution on social life. This has led social scientists to disregard actively developing new concepts or adapting existing ones to account for the potential influence of digital technology on various aspects of social life. Second, the growing availability of digital trace data has led computer scientists to address questions traditionally in the purview of social science in their work. Unfortunately, this growing interest in social phenomena as a research object has not come with a critical reflection on the specifics of this research object and engagement with available concepts and the current state of the respective topical research fields. Accordingly, empirical findings from this work are predominantly ill connected with central debates in the social sciences and, therefore, also have failed to make an impact there. Finally, the nature of digital trace data as a data source for inferences on social phenomena has not been appropriately reflected. Instead of naively treating them as a true mirror of social phenomena these data have to be critically interrogated according to their respective data generating processes in order to identify which elements of social life they can inform on. Only if these challenges are met by the field will we start to realize the promise of digital trace data in the social sciences.
Andreas Jungherr is Assistant Professor for Social Science Data Collection and Analysis at the University of Konstanz. His research focuses on the impact of digital technology on political communication and the use of digital trace data in the social sciences. His research has been published in the Review of International Political Economy, Journal of Communication, Journal of Computer-Mediated Communication, and The International Journal of Press/Politics.
Speaker: Andreas Jungherr, University of Konstanz
Human cooperation, although not a brand new topic in science, keep tracking attention and still have many questions unanswered. In this talk, María Pereda summarizes the main works of her research career, from models to experiments about human cooperation. In addition, the speaker presents the idea of a new experiment planned to be carried out at RWTH Aachen University on perception biases, inspired by a previous work from researchers at GESIS and RWTH Aachen.
María Pereda is a postdoctoral researcher at RWTH Aachen University, working with Markus Strohmaier at the Computational Social Sciences and Humanities group. Before, she was a postdoctoral researcher at the University Carlos III de Madrid (Spain) in the multidisciplinary group for complex systems, GISC, working with Anxo Sanchez in the IBSEN project, which aimed to build a repertoire of human behavior in large (+1000 people) structured groups using controlled experiments. She did her first postdoctoral research period at University of Burgos (Spain), studying the emergence and resilience of cooperation in ancient societies using complex systems methodologies. She got a Bachelor’s degree in Industrial Engineering, specialized in Electronics in 2006, and Degree in Industrial Organisation Engineering (with distinction) in 2008, both at the University of Burgos. She got a Master’s Degree in Research in Process Systems Engineering in 2010 and a Ph.D. in Process Systems Engineering at the University of Valladolid in March 2014 (with distinction). Her Ph.D. research work pursued to apply different artificial intelligence techniques to an automatic control problem: the control of a wastewater treatment plant. Her major research interest is the study of complex systems and the discovery of patterns and unpredictable behaviors. The main methods of her research so far have been Modelling, Machine Learning, Game theory and Network theory.
Speaker: María Pereda, RWTH Aachen University
The analysis of political violence and contention using event data has become the state-of-the-art in the discipline. Most of these event datasets are based on media reports, which are known to have different biases. This talk discusses two of them: the selection problem, which refers to the fact that media sources have uneven coverage across the world, and the accuracy problem, which means that the media may systematically misreport certain types of information. The talk presents analyses assessing the severity of these biases in conflict event data, and discusses implications for event data coding and analysis.
Nils B. Weidmann is Professor of Political Science and head of the "Communication, Networks and Contention" Research Group at the Department of Politics and Public Administration, University of Konstanz. Previously, he held research fellowships at the Centre for the Study of Civil War, Peace Research Institute Oslo (2011-12), the Jackson Institute, Yale University (2010-11), and the Woodrow Wilson School, Princeton University (2009-10). Nils received a M.Sc. in Computer Science from the University of Freiburg (Germany) in 2003, a M.A. in Comparative and International Studies from ETH Zurich (Switzerland) in 2008 and a Ph.D. in Political Science from ETH Zurich. His research deals with violent and non-violent contestation, with a particular focus on the impact of communication and information technology.
Speaker: Nils B. Weidmann, University of Konstanz
Digital media have changed political communication in modern societies. One of these changes regards the fact that laypersons increasingly exchange their political opinions in social networks and online communities. A specific concern about this new form of political communication is that it promotes attitude polarization and, thus, the fragmentation of societies due to a phenomenon that has lately been described as political homophily. The hypothesis: In Social Media, people are more likely to select and exchange political content and communication that is consistent with their personal political attitudes. Due to this homogenous information environment, pre-existing political attitudes are more likely to be affirmed and reinforced compared to the offline world.
In Germany, political communication on Facebook has drawn public attention during the so-called “Refugee Crisis” in 2015 and 2016. Mr. Rothmund and his colleagues conducted two empirical studies in order to investigate whether and how political homophily could be observed in Facebook communication in this specific context. First, Tobias Rothmund and his colleagues did an online survey (N = 894, April 2016) to investigate whether Facebook users were more likely to report (a) selective exposure to information on the refugee situation in Germany and (b) a stronger false consensus effect in regard to their political attitude on this topic. Their analyses revealed significant three-way interactions (Facebook Use x Attitude Valence x Attitude Strength) on both variables. Selective exposure and the false consensus effect were correlated with attitude strength especially among Facebook users with negative attitudes towards refugees. Second, Mr. Rothmund and his colleagues investigated political communication in Facebook groups (N = 51.177 participants) that where either concerned with supporting refugees (e.g., Refugees.Welcome.Regensburg) or with criticizing the German government for the way they handled the crisis (e.g., Rücktritt Merkel & co.). News feeds were content-analyzed between June 2015 and May 2016. Tobias Rothmund and his colleagues found evidence for differences in content and structure of political communication in Facebook groups.
The methodology and the results of the present studies are discussed in the light of the theoretical framework of political homophily and new challenges and trends in political communication and its academic investigation.
Tobias Rothmund is a junior professor of political psychology at the Institute for Communication Psychology and Media Education at the University Koblenz-Landau. His research focuses on the psychological function of trust for cooperation in social groups and society; stability and change on political attitudes and ideologies; psychological reactions to norm violations and experiences on injustice in political decision making; reception and effects of violence in mass media; and motivated reception of sciences and research.
Speaker: Tobias Rothmund, University of Koblenz-Landau
Though some warnings about online “echo chambers” have been hyperbolic, tendencies toward selective exposure to politically congenial content are likely to extend to misinformation and to be exacerbated by social media platforms. Andrew Guess and his colleagues test this prediction using data on the factually dubious articles known as “fake news.” Using unique data combining survey responses with individual-level web traffic histories, they estimate that approximately 1 in 4 Americans visited a fake news website from October 7-November 14, 2016. Trump supporters visited the most fake news websites, which were overwhelmingly pro-Trump. However, fake news consumption was heavily concentrated among a small group — almost 6 in 10 visits to fake news websites came from the 10% of people with the most conservative online information diets. Mr. Guess and his colleagues also find that Facebook was a key vector of exposure to fake news and that fact-checks of fake news almost never reached its consumers.
Andrew Guess is an assistant professor of politics and public affairs at Princeton University. His research sits at the intersection of political communication, public opinion, and political behavior. He uses a combination of experimental methods, large datasets, machine learning, and innovative measurement to study how people choose, process, spread, and respond to information about politics. Current or recent projects investigate online selective exposure, the dynamics of interest group mobilization over Twitter, and the persuasive effect of new information on individuals’ attitudes and beliefs.
Speaker: Andrew Guess, Princeton University
Social media technology is young, but has already played a part in numerous turbulent evets across the world – from protests to highly polarized elections. The use of social media for misinformation, trolling and harassment has often led to them being described as a threat to democracy. Yet, not long ago, social media was seen as the spearhead of democratizing forces and activists trying to make their voices heard in autocracies and, allegedly, the cause for mass protest mobilisation in both democracies and autocracies. Moving beyond this simple binary idea about social media’s impact on democracy, this presentation focuses on two different cases in which social media operate as a challenge and as an opportunity for democracy. Demonstrating the corrosive effects of social media incivility on online discussions, as well as their empowering impact on informal networks of citizens seeking ways to provide solidarity in conditions of institutional collapse, the presentation highlights the importance of context in understanding the broader implications of social media for democracy.
Yannis Theocharis is Assistant Professor at the Department of Media Studies and Journalism of the University of Groningen. He has previously positions as Alexander von Humboldt postdoctoral fellow and research fellow at the Mannheim Centre for European Social Research, where he was co-director of the "Social Media Networks and the Relationships between Citizens and Politics” project. His research interests are in political communication, political behaviour, and social networks. His work on these topics has appeared in political science, communication and interdisciplinary journals such as Journal of Communication, Journal of Computer-Mediated Communication, Social Science Computer Review, Electoral Studies, European Political Science Review and Journal of Democracy. The focus on his current research is on social media and incivility, and his book (co-authored with Jan W. van Deth) "Political Participation in a Changing World: Conceptual and Empirical Challenges" was published in 2018 by Routledge.
Speaker: Yannis Theocharis, University of Groningen
Recommender systems have become increasingly pervasive in our daily lives to support us in identifying relevant content in an overloaded information space. Much of the research in the Recommender Systems community has focused on building (mostly data-driven) recommendation models, which make strong and sometimes too simplified assumptions about human behavior and preferences. In this talk, Elisabeth Lex will show how psychological insights can be used to develop new recommender algorithms, which better reflect and predict user behaviour. First, a hashtag recommendation algorithm is introduced that mimics how people access information in their long-term memory. Ms. Lex and her colleagues found that temporal effects play a strong role in hashtag usage. Second, a computational model of human category learning is used to improve Collaborative Filtering by incorporating non-linear user-resource dynamics. Finally, it will be discussed the recent work of Elisabeth Lex and her colleagues on echo chambers and algorithmic fairness in recommender systems.
Elisabeth Lex is assistant professor at Graz University of Technology. She heads the Social Computing research area at Know-Center, Austria's Research Center for Data-driven Business and Big Data Analytics. Her research interests include Recommender Systems, Social Network Analysis, Data Science, and Open Science. Elisabeth has been work package leader in the FP7 IP Learning Layers project, scientific coordinator of the Marie Curie IRSES Web Information Quality Evaluation Initiative (WIQ-EI) project, and task leader in the H2020 Analytics for Everyday Learning (AFEL) project. Recently, Elisabeth has been member of the Expert Group on Altmetrics, which advised the European Commission, DG Research and Innovation on how to use Social Media signals to measure scientific impact. Elisabeth has (co-)authored 60+ peer-reviewed publications and regularly acts as reviewer and chair for major international conferences and journals. Among other courses at Graz University of Technology, Elisabeth teaches Web Science and she will start a new Complex Systems course in 2018.
Speaker: Elisabeth Lex, Graz University of Technology
As publishing has become more and more accessible and basically cost-free, virtually anyone can get their words printed, whether online or on paper. Such ease of disseminating content doesn't necessarily go together with author identifiability. In other words: it's very simple for anyone to publicly write any text, but it isn't equally simple to always tell who the author of a text is. Telling the author of a text can be thought of at various levels of detail. For example, in some contexts, and possibly in the interest of companies who want to advertise, or legal institutions, it can correspond to profiling, namely defining certain characteristics of the author, such as sex and age. In other contexts, and in the interest also of ancient and contemporary literary or historical studies, identifying authors can mean being able to tell whether two texts are likely to have been written by the same person. The latter problem can take more than one form in practice, as one could be faced with one unknown text to compare to another one written by a known author, or could be given a large number of unknown texts to be clustered according to authorship. To what extent is all this feasible? And is it meaningful?
In this talk, Malvina Nissim will discuss the specifics of such tasks, and describe a couple of systems that perform author profiling and author verification on different kinds of texts from different languages, experimenting with various linguistic and structural features. Ms. Nissim will also discuss such systems and their performance not only in terms of how they fare, but also in terms of what it means to profile and identify authors, and what challenges lie ahead for people working in this field.
Malvina Nissim is Associate Professor in Language Technology at the University of Groningen. She has extensive experience in modelling language phenomena from a computational perspective, with particular attention to sentiment analysis and author identification and profiling, especially on social media. She has (co-)authored 90+ peer-reviewed-publications, and regularly serves as reviewer/chair for major international conferences and journals. She graduated in Linguistics at the University of Pisa, and obtained her PhD in Computational Linguistics at the University of Pavia, in collaboration with the University of Edinburgh. Before joining the University of Groningen, she was a tenured researcher at the University of Bologna (2006-2014), and a post-doc at the University of Edinburgh (2001-2005), and the National Research Council in Rome (2005-2006). She is the University of Groningen's 2016 Lecturer of the Year.
Speaker: Malvina Nissim, Language Technology, University of Groningen
In recent years, social media and online social networking sites have become a major disseminator of false facts, urban legends, fake news, or, more generally, misinformation. To overcome this problem, online platforms are, on the one hand, empowering their users—the crowd—with the ability to evaluate the content they are exposed to and, on the other hand, resorting to trusted third parties for fact checking stories. However, given the noise in the evaluations provided by the crowd and the high cost of fact checking, the above mentioned measures require careful reasoning and smart algorithms. In this talk, the author will first describe a modeling framework based on marked temporal point process that links noisy evaluations provided by the crowd to robust, unbiased and interpretable notions of information reliability and source trustworthiness. Then, the author will introduce a scalable online algorithm, CURB, to select which stories to send for fact checking and when to do so to efficiently reduce the spread of fake news and misinformation with provable guarantees. Finally, Manuel Gomez Rodriguez will show the effectiveness of his team modeling framework and their algorithm using real-world data gathered from Wikipedia, Stack Overflow, Twitter and Weibo. This talk includes joint work with Behzad Tabibian, Jooyeon Kim, Isabel Valera, Mehrdad Farajtabar, Le Song, Alice Oh and Bernhard Schoelkopf.
Manuel Gomez Rodriguez is a tenure-track faculty at Max Planck Institute for Software Systems. Manuel develops machine learning and large-scale data mining methods for the analysis, modeling and control of large social and information online systems. He is particularly interested in the creation, acquisition and/or dissemination of reliable knowledge and information, which is ubiquitous in the Web and social media, and has received several recognitions for his research, including an Outstanding Paper Award at NIPS’13 and a Best Research Paper Honorable Mention at KDD’10 and WWW’17. Manuel holds a BS in Electrical Engineering from Carlos III University in Madrid (Spain), a MS and PhD in Electrical Engineering from Stanford University, and has received postdoctoral training at the Max Planck Institute for Intelligent Systems.
Speaker: Manuel Gomez Rodriguez, Max Planck Institute for Software Systems
Online Social Networks (OSN) are increasingly being used as platform for an effective communication, to engage with other users and to create a social worth. More the number of likes, followers and shares a user receives on an OSN platform, the social self worth of the user increases. Such metrics and crowdsourced ratings give the OSN user a sense of social reputation which she tries to maintain and boost to be more influential in the network and attract more following. Users sometimes artificially bolster their social reputation via blackmarket web services and crowdsourced manipulation services. In this talk, the author will describe various approaches to detect users with manipulated social reputation. The author and her colleagues have formulated an effective method which estimates the genuine social reputation of users with manipulated social metrics. In this talk, the author will take a step further to not only detect users with manipulated social reputation, but also to predict the correct social reputation of a user. Anupama Aggarwal and her colleagues use various attack models, and show that their prediction of a user’s social reputation is tolerant against blackmarket services and crowdsourced manipulation.
Anupama Aggarwal is a PhD student at IIIT-Delhi, India and a member of Precog@IIITD. Her research focuses on study of anomalous user behavior on online social networks. In general, her interest is social computing, and data mining on social graphs to understand user behavior.
Speaker: Anupama Aggarwal, Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi)
The self-measurement boom is linked to many risks despite euphoric assessments and promises of benefit by developers, pioneers and companies. Lifelogging (the sum of all technologies and applications used for digital self-measurement) as a ‘disruptive’ technology is changing our cultural matrix and hereby the institutionalized rules of coexistence. The cultural baseline that is currently changing is the manner in which the quantifiable consumer regards something as normal and socially desirable.
Measuring humans has always been an expression of rationalisation tendencies that have social implications. Over time, these tendencies have led to a new image of humanity, which is currently experiencing an update. The modern image of society is characterised by the translation of concrete objects and complex qualitative processes into abstract quantities. Lifelogging technologies have proven themselves to be an outstanding medium for that. The measurement of man and the reduction of such to a numerical object and a mere data set is creating a negative principle of organisation of the social. Self observation based on digital data is not only becoming more exact, it is also becoming increasingly divisive. The counter term to rational differentiation is, therefore, rational discrimination. This resulting phenomenon is located as a pathology of quantification between statistical and social discrimination, and analysed in its consequences.
From the perspective of cultural anthropology, digital self measurement is nothing more than a modern-day return to the alchemistic principle. The starting point is the ‘common’ person, the human who is not yet fully developed, or the human who represents a risk or a source of error or disturbance for society. With the help of quantification, one’s lifestyle is said to become more rational. And in accordance with social standards ‘common’ people should be transformed into ‘precious’ people. The effects of this digital transformation are explained against the background of theories about convivial tools (Ivan Illich), greedy institutions (Lewis Coser) and the outsourced self (Arlie Hochschild) in a society of assistance.
Stefan Selke is professor of “Sociology and Social change” at the Furtwangen University (http://en.hs-furtwangen.de) in Germany. He is also a research professor for “Transformative and Public Science”. His current research interests are the economy of poverty, reputation capital in the charity market, public sociology and the digitalisation of society.
Speaker: Stefan Selke, Furtwangen University
Nowadays, music aficionados generate millions of listening events every day and share them via services such as Last.fm or Twitter. In 2016, the LFM-1b dataset (http://www.cp.jku.at/datasets/LFM-1b) containing more than 1 billion listening events of about 120,000 Last.fm users has been released to the research community and interested public. Since then, we performed various data analysis and machine learning tasks on these large amounts of user and listening data. The gained insights helped to develop new listener models and integrate them into music recommender systems, in an effort to increase personalization of the recommendations. In this talk, I will elaborate on the following research topics we have targeted in the past two years:
Speaker: Markus Schedl, Johannes Keppler University Linz, Department of Computational Perception
Social media has brought a revolution on how people get exposed to information and how they are consuming news. Beyond the undoubtedly large number of advantages and capabilities brought by social-media platforms, a point of criticism has been the creation of filter bubbles or echo chambers, caused by social homophily as well as by algorithmic personalisation and recommendation in content delivery. In this talk, I will present the methods we developed to (i) detect and quantify the existence of polarization on social media, (ii) monitor the evolution of polarisation over time, and finally, (iii) devise methods to overcome the effects caused by increased polarization. We build on top of existing studies and ideas from social science with principles from graph theory to design algorithms which are language independent, domain agnostic and scalable to large number of users.
Kiran Garimella is a PhD student at Aalto University. His research focuses on identifying and combating polarization on social media. In general he is interested in making use of large public datasets to understand human behaviour. Prior to starting his PhD, he worked as a Research Engineer at Yahoo Research, QCRI and as an intern at Carnegie Mellon University, LinkedIn and Amazon. His work on polarization received the best student paper award at WSDM’17 and a best paper nomination at WebScience 2017.
Speaker: Kiran Garimella, Aalto University
The talk argues for the importance of forbidden triads (open triads with high weight edges) in predicting success in creative fields. Forbidden triads had been treated as a residual category beyond closed and open triads, yet we argue that they provide opportunities to combine socially evolved styles in new ways. Using data on the entire history of recorded jazz from 1896 to 2010, we show that observed collaborations have tolerated the openness of high weight triads more than expected, observed jazz sessions had more forbidden triads than expected, and the density of forbidden triads contributed to the success of recording sessions, measured by the number of releases out of the sessions’ material. The author also shows that the sessions of Miles Davis had received an especially high boost from forbidden triads.
Speaker: Balazs Vedres, Central European University
We are witnessing a momentous transformation in the way people interact and exchange information with each other. Content is now co-produced, shared, classified and rated by millions of people, while attention has become the ephemeral and valuable resource that everyone seeks to acquire. This content explosion is to a large extent driven by a mix of novel technologies with a deep human drive for recognition.
This talk will describe the regularities that govern how social attention is allocated among all media and the role it plays in the production and consumption of content. It will also describe how its dynamics not only helps determine the emergence of public agendas but also be used to predict the evolution of social trends.
Speaker: Bernardo Huberman, HP Labs and Stanford University
The many decisions people make about what information to attend to affect emerging trends, the diffusion of information in social media, and performance of crowds in peer evaluation tasks. Due to constraints of available time and cognitive resources, the ease of discovery strongly affects how people allocate their attention. Through empirical analysis and online experiments, we identify some of the cognitive heuristics that influence individual decisions to allocate attention to online content and quantify their impact on individual and collective behavior. Specifically, we show that the position of information in the user interface strongly affects whether it is seen, while explicit social signals about its popularity increase the likelihood of response. These heuristics become even more important in explaining and predicting behavior as cognitive load increases. The findings suggest that cognitive heuristics and information overload bias collective outcomes and undermine the “wisdom of crowds” effect.
Kristina Lerman is a Project Leader at the University of Southern California Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department. Trained as a physicist, she now applies network- and machine learning-based methods to problems in social computing and social media analysis.
Speaker: Kristina Lerman, University of Southern California
For decades, physical behavioral labs have been a primary, yet limited, method for controlled experimental studies of human behavior. Now, software-based "virtual labs" on the Internet allow for studies of increasing complexity, size, and scope. In this talk, I highlight the potential of virtual lab experiments for studying social interaction and coordination. First, we explore collective intelligence and digital teamwork in "crisis mapping", where digital volunteers organize to assess and pinpoint damage in the aftermath of humanitarian crises. By simulating a crisis mapping scenario to study self-organization in teams of varying size, and find a tradeoff between individual effort in small groups and collective coordination in larger teams. We also conduct a study of cooperation in a social dilemma over a month of real time, using crowdsourcing participants to overcome the time constraints of behavioral labs. Our study of about 100 participants over 20 consecutive weekdays finds that a group of resilient altruists sustain a high level of cooperation across the entire population. Together, our work motivates the potential of controlled, highly instrumented studies of social interaction; the importance of behavioral experiments on longer timescales; and how open-source software both can speed up the iteration and improve the reproducibility of experimental work.
* based on joint work with Lili Dworkin, Winter Mason, Siddharth Suri, and Duncan Watts.
Andrew Mao is currently a postdoctoral researcher in Computational Social Science at Microsoft Research in NYC. His research focuses on studying collective intelligence and social interaction on the Internet, such as teamwork in online communities and coordination in crowdsourcing systems. Andrew specializes in designing and gathering data from real-time, interactive, web-based behavioral experiments, and he is the designer of TurkServer (http://turkserver.readthedocs.io/), an open-source platform for building such experiments. His work has appeared in journals including Nature Communications and PLoS ONE as well as computer science conferences such as AAAI, EC, and HCOMP. He received his PhD from Harvard University in 2015.
Speaker: Andrew Mao, Microsoft Research NYC
Technology has advanced to a point where a large part of the population carries a mini-computer in their pockets that is disguised as a phone. Their role has changed from simple communication devices to multi-functional information devices. They are packed with various sensors, such as GPS, gyroscopes, and accelerometers, which can collect contextual data of the device and the user. Thanks to a multitude of applications they satisfy a large spectrum of different needs, and stay with their users most of their time. Smartphone usage is a hot topic in ubiquitous and pervasive computing due to their popularity and personal aspect.
The Menthal team has developed the Menthal framework (https://menthal.org) for collecting and analyzing mobile users' data. It is part of one of the largest in-the-wild smartphone studies. They attracted a large number of participants by running the study in a start-up format: building a user desirable product and then promoting it through media outlets. From the launch of the project, in January 2014, their app has been installed more than 400,000 times and their project attracted more than 350,000 registered participants. From these, they have collected general phone measurements, such as time spent on the phone using apps or communicating, but also other interesting data such as mood/affect measurements and Big Five personality traits. The collected data allows to study a number of problems in HCI, psychological sciences and medicine. In this talk they will present their framework from a technical point of view and afterwards they will discuss past and current results from their research project.
Speaker:Ionut Andone, University of Bonn
The way we express ourselves is heavily influenced by our demographic background. I.e., we don't expect teenagers to talk the same way as retirees. Natural Language Processing (NLP) models, however, are based on a small demographic sample and approach all language as uniform. As a result, NLP models perform worse on language from demographic groups that differ from the training data, i.e., they encode a demographic bias. This bias harms performance and can disadvantage entire user groups.
Sociolinguistics has long investigated the interplay of demographic factors and language use, and it seems likely that the same factors are also present in the data we use to train NLP systems.
In this talk, I will show how we can combine statistical NLP methods and sociolinguistic theories to the benefit of both fields. I present ongoing research into large-scale statistical analysis of demographic language variation to detect factors that influence the performance (and fairness) of NLP systems, and how we can incorporate demographic information into statistical models to address both problems.
Speaker: Dirk Hovy, Computer Science department (DIKU), University of Copenhagen
With increase in usage of the Internet, there has been an exponential increase in the use of online social media on the Internet. Websites like Facebook, Google+, YouTube, Orkut, Twitter and Flickr have changed the way the Internet is being used. There is a dire need to investigate, measure, and understand privacy and security on online social media from various perspectives (computational, cultural, psychological). Real world scalable systems need to be built to detect and defend security and privacy issues on online social media. I will describe briefly some cool projects that we work on: TweetCred, OSM & Policing, OCEAN, and Call Me MayBe. Many of our research work is made available for public use through tools or online services. Our work derives techniques from Computational Social Science, Data Science, Statistics, Network Science, and Human Computer Interaction. In particular, in this talk, I will focus on the following: (1) TweetCred, a tool to extract intelligence from Twitter which can be useful to security analysts. TweetCred is backed by award-winning research publications in international and national venues. (2) How police in India are using online social media, how we can use computer science understanding to help police engage more with citizens and increase the safety in society. (3) OCEAN: Open source Collation of eGovernment data and Networks, how publicly available information on Government services can be used to profile citizens in India. This work obtained the Best Poster Award at Security and Privacy Symposium at IIT Kanpur, 2013 and it has gained a lot of traction in Indian media. (4) Given an identity in one online social media, the author interested in finding the digital foot print of the user in other social media services, this is also called digital identity stitching problem. This work is also backed by award-winning research publication.
Speaker: Ponnurangam Kumaraguru, Indraprastha Institute of Information Technology (IIIT), Delhi, India
Research into socio-technical systems like Wikipedia has overlooked important structural patterns in the coordination of distributed work. This paper argues for a conceptual reorientation towards sequences as a fundamental unit of analysis for understanding work routines in online knowledge collaboration. I outline a research agenda for computational social science researchers to understand the relationships, patterns, antecedents, and consequences of sequential behavior extending methods already developed in fields like sociology and bio-informatics. Using a data set of 37,515 revisions from 16,616 unique editors to 96 Wikipedia articles as a case study, we analyze the prevalence and significance of different sequences of editing patterns. We illustrate the mixed method potential of sequence approaches by interpreting the frequent patterns as general classes of behavioral motifs. We conclude by discussing the methodological opportunities for using sequence analysis for expanding existing approaches to analyzing and theorizing about co-production routines in online knowledge collaboration.
Speaker: Brian Keegan, Harvard Business School
Determining the relative centrality of actors, or the degree to which they are structurally important, is a most common technique in social network analysis. Many indices have been proposed to measure a variety of centrality conceptions, and choosing one that is most appropriate for the particular research question and data at hand proves a challenge in many empirical studies. We use a general result about all common centrality indices to motivate a re-conceptualization of centrality. Our new approach is the first instantiation of a recently introduced positional framework for network analysis. By breaking down complex analytical into comprehensible steps, multivariate data and theoretical assumptions can be integrated more flexibly. Several examples serve to illustrate this point.
Speaker: Ulrik Brandes, University of Konstanz
Characterising how we explore abstract spaces is key to understand our (ir)rational behaviour and decision making. While some light has been shed on the navigation of semantic networks, however, little is known about the mental exploration of metric spaces, such as the one dimensional line of numbers, prices, etc. Here we address this issue by investigating the behaviour of users exploring the “bid space” in online auctions. We find that they systematically perform Lévy flights, i.e., random walks whose step lengths follow a power-law distribution. Interestingly, this is the best strategy that can be adopted by a random searcher looking for a target in an unknown environment, and has been observed in the foraging patterns of many species. In the case of online auctions, we measure the power-law scaling over several decades, providing the neatest observation of Lévy ﬂights reported so far. We also show that the histogram describing single individual exponents is well peaked, pointing out the existence of an almost universal behaviour. Furthermore, a simple model reveals that the observed exponents are nearly optimal, and represent a Nash equilibrium. We rationalise these ﬁndings through a simple evolutionary process, showing that the observed behaviour is robust against invasion of alternative strategies. Our results show that humans share with other animals universal patterns in general searching processes, and raise fundamental issues in cognitive, behavioural and evolutional sciences.
Speaker: Andrea Baronchelli, City University London
Human mobility has been a hot topic of interest for researchers due to its importance for many application scenarios that include nearby place search, mobile context awareness or mobile advertising. Despite the bevy of research on human mobility patterns analysis and prediction modeling of individual users, however, little attention has been put on the mobility patterns of user collectives across places in a city. In this talk, we will exploit network analysis techniques to view human movement in urban environments from the perspective of an aggregate networked system where nodes are Foursquare venues. We will discuss the geometric properties of place networks in a large number of metropolitan areas around the world and how those compare to other well studied types of networks, such as on-line social networks or the web.
Next, we will shed light on the growth patterns of place networks in terms of node and edge generation processes. Motivated by the fact that a large number of new links is emerging over time in those networks, we will define a link prediction task in this novel application domain with the aim to predict future interactions between Foursquare venues. The talk will close by providing a head to head comparison over the prediction task amongst the well-known, in human mobility literature, gravity models, network-based techniques as well as supervised learning algorithms.
Speaker: Anastasios Noulas, University of Cambridge
In the era of big data and social media analysis, as a way forward, I propose an alternative to vanity metrics or the quantification of trend and personal influence. Rather, for the study of Twitter, Facebook and other secondary social media, I would like to put forward a critical data analytics that is sensitive to big data critique on the one hand and embraces analytical strategies with digital methods based on expertise and engagement on the other hand, making findings and outputting visualisations which are both insightful for (ethical) social research and aware of the hegemony of the graph.
Richard Rogers is Department Chair of Media Studies and Professor of New Media and Digital Culture at the University of Amsterdam. He is author most recently of Digital Methods (MIT Press, 2013), winner of the ICA outstanding book award, and Issue Mapping for an Ageing Europe (Amsterdam University Press, 2015), with Natalia Sanchez and Aleksandra Kil. He is Director of the Digital Methods Initiative and the Govcom.org Foundation, known for online mapping tools such as the Issue Crawler and the Lippmannian Device. He has received research grants from the Ford Foundation, Gates Foundation, MacArthur Foundation, Open Society Institute and Soros Foundation, and has worked with such NGOs as Greenpeace International, Human Rights Watch, Association for Progressive Communications, Women on Waves, Carbon Trade Watch and Corporate Observatory Europe.
Speaker: Richard Rogers, Digital Methods Initiative, University of Amsterdam
Traditionally, most of football statistical and media coverage has been focused almost exclusively on goals and (ocassionally) shots. However, most of the duration of a football game is spent away from the boxes, passing the ball around. The way teams pass the ball around is the most characteristic measurement of what a team’s “unique style” is. In this talk we will showcase how the study of a passing network keeps track of the team’s playing style, and how network invariants such as PageRank provide an adequate measurement for players involvement. Next, we will proceed further into the analysis of passing sequences, what are their likely outcomes, and how the passing patters allow us to construct a “digital fingerprint” of a player’s style.
Speaker: Javier López Peña, University College London
The convergence of social and technical systems provides us with a wealth of data on the structure and dynamics of social organizations. It is tempting to utilize these data in order to better understand how social organizations evolve, how their structure is related to their "success", and how the position of individuals in the emerging social fabric affects their performance and motivation. Taking a complex network perspective on these questions, in this talk I will introduce recent research results obtained in the context of collaborative software engineering. These results demonstrate the potential of network-based data mining methods in the study of social organizations. At the same time, I will highlight fallacies arising in the application of the complex networks perspective to social systems.
Speaker: Ingo Scholtes, ETH Zürich
It has become popular to tap into the "intelligence of the crowd" on the Internet. This talk argues that more often than not, the crowd flips from intelligence to madness, showing more characteristics of football hooligans than complex problem solving behavior. This is in contrast to what I call "creative swarms", where small teams of intrinsically motivated people work together in Collaborative Innovation Networks (COINs) to invent something radically new. The key difference is in motivation: crowds are motivated by money, power and glory, while swarms are intrinsically motivated by the problems they are trying to solve.
The talk introduces a collaboration scorecard made up of six key variables – “honest signals” – indicative of creative swarms. The variables are computed by analyzing global communication on the Web, in Twitter, and Wikipedia, in organizations through e-mail, and in small teams through sociometric badges.
The talk is illustrated by many examples, with emphasis on high-tech firms and healthcare. For instance, it illustrates how customer satisfaction and employee attrition is predicted in a large Indian outsourcing company by analyzing the company’s e-mail archive. It also introduces the Chronic Collaborative Care Network (C3N) at Cincinnati Children's Hospital, where COINs of medical researchers, physicians, patients and their families are working together to improve the lives of patients with Crohn's disease, diabetes, and cystic fibrosis. Analyzing the e-mail archive of the C3N innovation teams and providing them with a process called “virtual mirroring” where the communication behavior of creative teams is mirrored back, helps them to increase creativity by improved communication.
Speaker: Peter A. Gloor, MIT's Sloan School for Management
Query-specific Wikipedia Construction
We all turn towards Wikipedia with questions we want to know more about, but eventually find ourselves on the limit of its coverage. Instead of providing "ten blue links" like common in Web search, my goal is to answer any web query with something that looks and feels like Wikipedia. I am developing algorithms to automatically retrieve, extract, and compile a knowledge resource for a given web query. I will talk about a supervised retrieval model that can jointly identify relevant Web documents, Wikipedia entities, and extract support passages.
Network Topic Models
Topic models such as Latent Dirichlet Allocation are an unsupervised technique to extract word clusters with topical character from a given corpus of text documents. Often we find text documents with an underlying link structure, or a network in which nodes are associated with text content. It is often assumed that connected nodes have some shared trait or interest which motivated the forming of the connection. In this talk, I will discuss several topic model extensions for textual network data. This includes the Citation Influence Model  which quantifies the strengths of a citation strength in an acyclic graph through a topic model. Furthermore, I will discuss the Shared Taste Model  which learns topics that capture shared interests in an undirected social network. As communication between users is often off-limits due to privacy concerns, the model learns from public text written by users, such as tweets, tags, posts, etc. The goal is to predict which friend of the user is interested in the content. The source code for both models is available on Github.
Speaker: Laura Dietz, Center for Intelligent Information Retrieval (CIIR) at University of Massachusetts
Recent work: Improving Website Hyperlink Structure Using Server Logs
Good websites should be easy to navigate via hyperlinks, yet maintaining a link structure of high quality is difficult. Identifying pairs of pages that should be linked may be hard for human editors, especially if the site is large and changes are frequent. To support human editors, we develop an approach for automatically finding useful hyperlinks to add to a website. We show that passively collected server logs, beyond telling us which existing links are useful, also contain implicit signals indicating which nonexistent links would be useful if they were to be introduced. We leverage these signals to model the future usefulness of as yet nonexistent links. Based on our model, we define the problem of link placement under budget constraints and propose an efficient algorithm for solving it. We demonstrate the effectiveness of our approach by evaluating it on Wikipedia and Simtk.org. (Joint work with Ashwin Paranjape and Jure Leskovec of Stanford, and Leila Zia of Wikimedia)
Ongoing work: Media Coverage of Death
Death is an inevitable fact of the human condition and as such draws much attention. The deaths of famous people tend to be widely covered by the media in the form of obituaries and news articles, and may lead to a sustained change in the way their lives are collectively remembered. In this work in progress, we ask the question how deceased famous people are remembered by the media. To shed light on this question, we identify a set of notable people deceased during the six years from 2008 to 2014 and track them in a large corpus of news articles and blog posts spanning the entire six-year period. Our results show that death generally has a profound impact on how people are perceived by the media. Further, we find that posthumous media coverage varies with the circumstances of death and the biographic background of the deceased. (Joint work with Jure Leskovec and Christopher Potts of Stanford)
Speaker: Robert West, InfoLab at Stanford University
At the beginning of 2014, as an answer to the growing concerns about the role played by data mining/machine learning algorithms in decision-making, USA President Obama called for a 90-day review of big data collecting and analysing practices. The resulting report concluded that “big data technologies can cause societal harms beyond damages to privacy”. In particular, it expressed concerns about the possibility that decisions informed by big data could have discriminatory effects, even in the absence of discriminatory intent, further imposing less favorable treatment to already disadvantaged groups. In its recommendations to the President, the report called for additional "technical expertise to stop discrimination", and for further research into the dangers of "encoding discrimination in automated decisions".
In parallel to development in anti-discrimination legislation, efforts at fighting discrimination have led to developing anti-discrimination techniques in data mining. Some proposals are oriented to the discovery and measurement of discrimination, while others deal with preventing data mining from becoming itself a source of discrimination, due to automated decision making based on discriminatory models extracted from inherently biased datasets. In this talk, I will introduce some of the recent techniques for discrimination prevention, simultaneous discrimination and privacy protection, and discrimination discovery and show some recent results.
Speaker: Sara Hajian, Eurecat-Technology center of Catalonia
Due to the low acquisition cost and its sheer scale, social media is becoming a popular data source for studies on tracking health trends at scale. These studies usually take a population-centric approach where their “ground truth” used for validation and model fitting is derived either from time series data, e.g. from temporal influenza activity, or from geo-graphically varied data, e.g. from county-level obesity rates in the US. In this talk, I will present recent and ongoing work that uses social media data to study lifestyle diseases such as obesity. The first line of work takes the population-centric approach and uses food mentions on Twitter to study obesity. I will then move on towards individual-centric health studies of obesity and dieting. Such a fine-grained level analysis is made possible by (i) labeling individual users as “is overweight or not” using their profile pictures, and (ii) analyzing users whose internet-enabled smart scales tweet their weight. I’ll conclude by outlining a vision of how social media data, physical sensor data, and electronic health records could be combined in a clinical setting to provide a more holistic view on a patient’s health.
Speaker: Ingmar Weber, Qatar Computing Research Institute
Prominent data scientists have declared “the end of theory” in the era of big data. I argue that it is rather the beginning, due to new opportunities for a relational theoretical framework. From astronomy to neuroscience to particle physics, scientific knowledge depends decisively on the available tools for observation. For the past century (or more), the survey has been the single most important observational tool for social science. During this time, enormous advances have taken place in our ability to reduce systematic bias in sampling hidden populations, to reduce measurement error in the responses to survey items, to reduce statistical error in the analysis of results, and to reduce inferential error in causal models of the associations among the measures. Nevertheless, increasing confidence in survey technology has paradoxically reinforced a debilitating theoretical blinder that has compromised the ability of social science to elicit confidence in predictions. What is worse, this blinder has largely escaped notice through a combination of ideological bias and reluctance to pull back the covers on problems for which we have no solution. The good news is that a solution is finally on the horizon.
Speaker: Michael Macy, Goldwin Smith Professor of Arts and Sciences and Director of the Social Dynamics Laboratory, Cornell University
The personal stories that people post to their public weblogs offer a glimpse into the everyday lives of people. In this talk I will discuss our efforts to automatically gather tens of millions of these stories, and use them as a dataset for investigating different populations of authors. I will discuss our work on analyzing the stories that people tell about health emergencies (strokes), and how this led us to concerns about sample bias. I will describe our ongoing work on bias correction for social-media samples, and discuss opportunities afforded by populations of extremely prolific webloggers whose demographic information can be readily extracted from the stories they share.
Speaker: Andrew S. Gordon, Institute for Creative Technologies, University of Southern California
With the widespread adoption of social media sites like Twitter and Facebook, there has been a shift in the way information is produced and consumed in our societies. Traditionally, information was produced by large news organizations, which broadcast the same carefully-edited information to all consumers over mass media channels. In contrast, in online social media, any user can be a producer of information, and every user selects which other users she connects to, thereby choosing the information she consumes. Furthermore, recommender systems deployed on most social media sites provide users with additional information that is tailored to their individual tastes.
In this talk, I will introduce the concept of information diet – which is the topical or distribution of a given set of information items (e.g., tweets) – to characterize the information produced and consumed by various types of users in the popular Twitter social media. At a high level, we find that (i) popular users mostly produce very specialized diets focusing on only a few topics; in fact, news organizations (e.g., NYTimes) produce much more focused diets on social media as compared to their mass media diets, (ii) most users’ consumption diets are primarily focused towards one or two topics of their interest, and (iii) the personalized recommendations provided by Twitter help to mitigate some of the topical imbalances in the users’ consumption diets, by adding information on diverse topics apart from the users’ primary topics of interest.
Speaker: Krishna Gummadi, Max Planck Institute for Software Systems
An der Leibniz Universität Hannover wurde eine interdisziplinäre Studie zur Repräsentation von Street Art in Flickr durchgeführt. An der Untersuchung waren ein Soziologe, zwei Informatiker und mehrere Hilfskräfte beteiligt. Ausgangspunkt des Forschungsvorhabens war die These von Ulf Wuggenig, dass sich mit dem Internet auch die Art und Weise der Rezeption und Wahrnehmung von Street Art verändere. Anstatt etablierter Kunstinstitutionen würde vor allem das Internet dazu beitragen, die Repräsentation und Anerkennungsprozesse von Street Artists zu befördern. Dazu wurde eine visuelle Inhaltsanalyse durchgeführt. Das Forschungsprojekt kam zu dem Ergebnis, dass in Flickr eine Street Art-"orientierte" Community existiert und für die Repräsentation von Street Art sorgt. Ich werde daher in meinem Vortrag auf die Möglichkeiten eingehen, anhand verfügbarer Metadaten die Community zu beschreiben und wie Street Art in Flickr repräsentiert wird. In diesem Zusammenhang gehe ich auch auf weiterführende Fragestellungen ein, die sich für mich aus der Forschung ergeben haben.
Speaker: Axel Philipps, Leibniz University of Hannover
Wikipedia is a huge global repository of human knowledge, and at the same time one of the largest experiments of online collaboration. Its articles, their links and the negotiations around their content tend to reflect societal debates in different language communities. The first work David Laniado will present is Contropedia, a platform that adds a layer of transparency to Wikipedia articles. Combining activity from the edit history and discussion in talk pages, the platform uses wiki links as focal points to explore the development of controversial issues over time. The second study focuses on the network of hyperlinks in different language editions of Wikipedia. A ranking of the most central biographies in each language edition is used to study relationships and influences between cultures. Finally, the speaker will present a large-scale analysis of emotional expression and communication style of editors in Wikipedia discussions, focusing on how emotion and dialogue differ depending on the editors' status, gender, and communication network.
Speaker: David Laniado, Barcelona Media
Modeling social media and blogging is a fashionable topic of research. The challenges include how to model, exploit, store, and analyze social content data. In this talk the speaker will discuss about novel approaches on how to exploit such data in order to build: (1) new types of recommender systems; (2) discover and understand the evolution of topics over time.
Speaker: Puya Hossein Vahabi, Yahoo Labs Barcelona
Since 1970s urban theories proposed by Lynch and Milgram aimed at understanding complex city dynamics. Can these theories be put to use for enabling new mobile services? The answer is a definitive “Yes!”. Existing mapping technologies return shortest directions. To complement them, we are designing new mobile phone tools that return directions that not only are short but also make the experience of pedestrians happier. To capture a fuzzy concept such as happiness, we have combined Flickr metadata with the crowdsourcing site urbangems.org, which I co-designed with colleagues at the University of Cambridge. This crowdsources visual perceptions of quiet, beauty and happiness across the city of London using pictures of street scenes.
Speaker: Daniele Quercia, Yahoo Labs Barcelona
The problem of understanding the dynamics of collective attention has been identified as a key scientific challenge for the information age. In this talk, we first show that search behaviors of large populations of Internet users evolve in a highly regular manner and that corresponding time series can be modeled using skewed distributions. We then ask if such dynamics could be explained in terms of infectious processes that take place in social networks and derive a physically plausible, model for the temporal dynamics of graph diffusion processes. Our results are based on maximum entropy arguments and provide new approaches to problems in network analysis and mining.
Speaker: Christian Bauckhage, Fraunhofer IAIS
Online participatory media, such as social networking sites and forums, changed Internet users from simple information consumers to active producers of online content, turning our societies into "digital democracies". As part of a Swiss SNF funded project, we explore the dynamics of polarization of opinions and social structures through the digital traces left by politicians and voters in online participatory media.
Our first study focuses on Politnetz, a Swiss platform focused on political activity, composed of support links between politicians, comments, and likes. We analyzed network polarization as the level of intra-party cohesion with respect to inter-party cohesion, finding that support show a very strongly polarized structure with respect to party alignment. We found that comment structures follow topics related to Swiss politics, and that polarization in likes evolves in time, increasing when the federal elections of 2011 were close. Furthermore, we analyzed the internal social structure of each party through social network metrics related to hierarchical structures and information efficiency. This analysis highlights patters of the relation between the connectivity patterns of parties and their political position within a multi-party system. Our second work analyzes the evolution of the 15M movement through its digital traces in the Twitter social network. We analyzed the tweets related to the movement during 30 days around its creation, providing an illustration of the evolution and structure of the movement at its collective and individual level. We found patterns of influence of collective action and mass media in the polarization of opinions about the movement, and found different stages of movement formation and expansion through Twitter activity. Our sentiment and psycholinguistic analysis of the content of tweets reveals that activity cascades with strong negative sentiment and social-related terms spread to larger amounts of users. At the individual level, we found that users that are more embedded in the movement display higher levels of activity and express stronger negativity, in line with the overall negative context of the movement.
Speaker: David Garcia, ETH Zürich
Though online social network research has exploded during the past years, not much thought has been given to the exploration of the nature of the social structures that compose them. Online interactions have been interpreted as indicative of one social process or another (e.g., status exchange or trust), often with little systematic justification regarding the relation between observed data and theoretical concept. Our research aims to breach this gap in computational social science by trying to explain the nature and purpose of social structures, with quantitative metrics that are directly derived from longstanding concepts in social sciences. In this talk we will discuss about characterization of social links and social groups. We propose a method based on Blau's notion of resource exchange that discovers, with high accuracy, the fundamental domains of interaction occurring over links in social networks. By applying this method on two online datasets different by scope and type of interaction (aNobii and Flickr) we observe the spontaneous emergence of three domains of interaction representing the exchange of status, knowledge and social support. By finding significant relations between the domains of interaction and classic social network analysis issues (e.g., tie strength, dyadic interaction over time) we show how the network of interactions induced by the extracted domains can be used as a starting point for more nuanced analysis of online social data that may one day incorporate the normative grammar of social interaction. Also, we explore the nature of online groups through the lens of the common identity and common bond theory, defining a set of features to classify groups into those two categories. We show that the classification works with high accuracy on Flickr groups.
Speaker: Luca Maria Aiello, Yahoo Labs Barcelona
Wikipedia is the largest encyclopaedia in the world and seeks to "create a summary of all human knowledge". An encyclopaedia is supposed to contain a collection of objective facts, reported by secondary sources. However, the crowdsourced nature of Wikipedia makes it a source of information by itself reflecting the interests, preferences, opinions, and priorities of the members of its community of editors. By analysing the editorial conflicts between the editors of different language edition, we can create interesting images of each language community interests and concerns. Moreover, the page view statistics of Wikipedia articles, provide a unique insight to the patterns of information seeking by its readers. In this presentation, we start by Wikipedia edit wars and discuss what we could learn from the warring patterns about our real life facts, and then three examples are shown, in each of them statistics of editorial activities and page views are considered as proxies to assess popularity and visibility of items. Movie market, election, and scientific reputation are the three topics we have investigated and observed under certain conditions, there is a high correlation between popularity and Wikipedia edits and page views volumes. Based on these correlations and in the presence of external data to calibrate a predictive model, one is able to forecast the prospective success of an item in a reasonably accurate way.
Speaker: Taha Yasseri, Oxford Internet Institute
FuturlCT is a global initiative pursuing a participatory approach, integrated across the fields of ICT, the social sciences and complexity science, to design socio-inspired technology and develop a science of global, socially interactive systems. The initiative wants to bring together, on a global level, Big Data, new modelling techniques and new forms of interaction, leading to a new understanding of society and its co-evolution with technology. The goal is to create a major scientific drive to understand, explore and manage our complex, connected world in a more sustainable and resilient manner.
The initiative is motivated by the fact that ubiquitous communication and sensing blur the boundaries between the physical and digital worlds, creating unparalleled opportunities for understanding the socio-economic fabric of our world, and for empowering humanity to make informed, responsible decisions for its future. The intimate, complex and dynamic relationship between global, networked ICT systems and human society directly influences the complexity and manageability of both. This also opens up the possibility to fundamentally change the way ICT will be designed, built and operated, reflecting the need for socially interactive, ethically sensitive, trustworthy, self-organized and reliable systems.
It is planned to build a new public resource - value-oriented tools and models to aggregate, access, query and understand vast amounts of data. Information from open sources, real-time devices and mobile sensors would be integrated with multi-scale models of the behaviour of social, technological, environmental and economic systems, which could be interrogated by policy-makers, business people and citizens alike. Together, these would build an eco-system leading to new business models, scientific paradigm shifts and more rapid and effective ways to create and disseminate new knowledge and social benefits - thereby forming an innovation accelerator.
Speaker: Dirk Helbing, ETH Zürich
The increasing availability of data across different socio-technical systems, such as online social media, mobile phone networks, and collaborative knowledge platforms, presents novel challenges and intriguing research opportunities. As more online services permeate through our everyday life and as data from various domains are connected and integrated with each other, the boundary between ‘real-world’ and ‘virtual online world’ becomes blurry. Scholars from different fields have now rich sources of information on individual behaviors at a scale that only a decade ago was hardly conceivable. Such data cover both online and offline activities of people, as well as multiple time scales, prompting a variety of research questions on human behaviors and activities in the real and online worlds. In this talk I will discuss two examples of how online and offline worlds interact and affect each other. In the first case, I'll show how online conversation on Twitter triggers and responds to real worlds events in the context of the Occupy Wall Street social mobilization. In turn, the second example will illustrate how human mobility affects topics of discussion on such online platforms: I'll draw a parallel between information diffusion and epidemics spreading, showing that the dynamics driving the emergence of collective attention and trends are tightly interconnected with individuals mobility in the real world.
Speaker: Emilio Ferrara, Indiana University Bloomington
Nowadays Internet users rely heavily on many information filters, in the guise of search engine, recommender system, people they follow on social media, to filter out unfavorable information and seek agreeable opinions. Critics warned that they would isolate people in their own belief, cultural and ideological echo chambers. As a result, we may see increasing social fragmentation and political polarization in our society. This talk will discuss recent Human-computer Interaction (HCI) research in the attempt to mitigate the problem. We conduct user studies and analyze user generated data on the Web to understand how people behave cognitively and socially in online ideologically diverse environment. By leveraging such knowledge and relevant psychological theories, we design interfaces that encourage seeking of diverse opinions and stimulate cross-ideology conversations.
Speaker: Q. Vera Liao, University of Illinois at Urbana-Champaign
The microblogging service Twitter is increasingly becoming a tool for political communicators and the public in various countries to communicate about politics. This is also true in Germany. Germany offers an interesting case to analyze the impact of Twitter on political communications. Since 2009, Twitter has been a central tool in election campaigns, political activism, the self-marketing of politicians, and the media coverage of campaigns. The talk will address recent research on the use of Twitter during the campaigns for the federal elections 2009 and 2013 and during one of the biggest political protests in Germany’s recent past, the protests against the infrastructure project Stuttgart 21. Questions addressed in this talk will be: Do political Twitter messages predict the results of elections in Germany, which kind of political events do lead to spikes in Twitter messages commenting on politics, and how do politically vocal Twitter users use Twitter during campaigns?
Speaker: Andreas Jungherr, University of Konstanz
Scholars of media and communication sciences study the role of media in our society. They frequently search through media archives to select items that cover a certain event. When this is done for large time spans and across media-outlets, this task can however be challenging and laborious. The interdisciplinary project PoliMedia aims to stimulate and facilitate large-scale, cross-media analysis of the coverage of political events. We focus on the meetings of the Dutch parliament, and provide automatically generated links between the transcripts of those meetings, newspaper articles, including their original lay-out on the page, and radio bulletins. Via a web application users are able to search through the debates and find related media coverage in various media outlets, facilitating a more efficient search process and analysis of the media coverage. Furthermore, the generated links are available in an online database, allowing quantitative analyses with complex, structured queries.
Speaker: Laura Hollink, Centrum Wiskunde & Informatica