GESIS Web Data

What is the GESIS Web Data Service?

The Web Data for the Social Sciences service acts as an umbrella for different activities around collecting digital behavioral data from the Web, especially from online platforms, including social media. It serves as an entry point to long-term samples from specific platforms (such as Twitter or Telegram) and additional data offers specifically prepared to enable research on current topics of societal relevance, or acute events.  

We are currently working on the implementation of new data collections. The first two will be a continuous crawl of Telegram channels and a collection of social media content and  advertisement  as well as search engine data from the German candidates for the 2024 European Parliament election. In general, the selection of platforms and topics is based on their relevance for social science research, technical feasibility, legal and ethical considerations, and community input. 

Existing datasets

Here you will find an overview of the DBD datasets already available at GESIS.
Additional datasets can be found in the thematic GESIS data collection “Digital Behavioral Data” via the GESIS Search.

Continuous long-term crawls
Tweet 1%-Random Sample Archive
Telegram channels stream (under development)

Learn more

The Web Data for the Social Sciences service consists of three components:

Data Collection
Continuous or time-limited topical data collections from different platforms

Data Offers
Specific data offers created from the different web data collections that are made accessible to the community via existing (e.g., Data download via catalog or access via the Secure Data Center) or future access channels (such as secure remote access or through APIs).

Community Engagement
Community engagement activities, such as workshops, hackathons, and user surveys to capture user needs and feedback and provide support.

There are several reasons why the Web Data for the Social Sciences service is valuable for the research community:

  • Independence from commercial third parties, whose interests do not necessarily align with open science principles and who may change access modalities at any point in time.
  • Continuous collections of Web data stream ensure that historic data is accessible on any emerging topic and that researchers do not have to rely on post-hoc data collection that can only start after a particular event or topic has been identified, in which case the collection of historic data may be deleted or constrained by platform APIs.
  • Resources for large scale and/or continuous collections of Web data are often not available to individual researchers or research projects (especially smaller ones). An infrastructure institute like GESIS, however, is able to preemptively carry out such tasks.
  • Persistence and long-term availability of data are a crucial requirement for reproducibility and reusability. Reproducibility and reusability are key features supported by relying on public data archives, where the used data is archived for research purposes and transparency about both, used data and the applied methods for retrieval, sampling, and interpretation can be ensured.