What is GESIS Web Data?
The service GESIS Web Data acts as an umbrella for different activities around collecting digital behavioral data from the Web, especially from online platforms, including social media. It serves as an entry point to long-term samples from specific platforms (such as Twitter or Telegram) and additional data offers specifically prepared to enable research on current topics of societal relevance, or acute events.
We are currently working on the implementation of new data collections. The first two will be a continuous crawl of Telegram channels and a collection of social media content and advertisement as well as search engine data from the German candidates for the 2024 European Parliament election. In general, the selection of platforms and topics is based on their relevance for social science research, technical feasibility, legal and ethical considerations, and community input.
Existing datasets
Datasets of DBD in our archive can be found in the thematic GESIS data collection “Digital Behavioral Data” via the GESIS Search.
The service GESIS Web Data consists of three components:
What are the benefits of this service for the social science research community?
There are several reasons why the service GESIS Web Data is valuable for the research community:
- Independence from commercial third parties, whose interests do not necessarily align with open science principles and who may change access modalities at any point in time.
- Continuous collections of Web data stream ensure that historic data is accessible on any emerging topic and that researchers do not have to rely on post-hoc data collection that can only start after a particular event or topic has been identified, in which case the collection of historic data may be deleted or constrained by platform APIs.
- Resources for large scale and/or continuous collections of Web data are often not available to individual researchers or research projects (especially smaller ones). An infrastructure institute like GESIS, however, is able to preemptively carry out such tasks.
- Persistence and long-term availability of data are a crucial requirement for reproducibility and reusability. Reproducibility and reusability are key features supported by relying on public data archives, where the used data is archived for research purposes and transparency about both, used data and the applied methods for retrieval, sampling, and interpretation can be ensured.
GESIS Web Data is a service in development
If you have questions, feedback, or wish to collaborate, please e-mail us.
Team
Team