Session 3️⃣: Data Donation Studies (Researcher Perspective)
👉 Part of the SPP DFG Project Integrating Data Donations in Survey Infrastructure
Please come up with 2-3 research questions/hypotheses you may want to answer using data donation. To answer these, which methodological decisions would you have to take? 🤔
Research design & tool set-up
Data cleaning & augmentation
Modelling
📢 Task 3: Example Analysis of YouTube Watch history
Image by Hope House Press via Unsplash
Source: Image by Markus Winkler via Unsplash

Empirical studies on data donation focus on …
Research designs include …
Research designs include …
⚠️ Causal inference remains a key problem!
⚠️ Match between theoretical concepts and measurements remains a key problem!
Key questions:
Key questions:
Choose a tool, e.g., …
Choose a tool, e.g., …
Relevant questions include…
Key questions:
Key questions:
You can find the following Python code for data extraction here:
Key decisions include:
Key questions:
Please look at your data and discuss: What needs to be anonymized? How could we do this? 🤔
Good anonymization may require…
Figure. Exampe whitelist
Figure. Example anonymized data
Let’s have a look at the technical set-up 💻:
Figure. Next setup
Figure. Next setup
Figure. Next setup
Figure. Next setup
What are (dis-)advantages of online access opt-in panels for data donation? 🤔
Low response rates (e.g., Hase & Haim, 2024; Keusch et al., 2024)
Figure. Data donation study - researcher perspective
Figure. Data donation study - researcher perspective
This is how your data may look like:
Figure. Donated data - example
This is how your data may look like:
Figure. Donated data - example
Often, we need to further preprocess collected data through…
Figure. Data donation study - researcher perspective
Figure. Data donation study - researcher perspective
👉 You know the drill: We will talk about this in session 4️⃣.
Figure. Data donation study - researcher perspective
📢 Task 3: Example Analysis of YouTube Watch history
Download the “Data for Task 3” from the website or use your own YouTube watch history. Also, load the respective R-code. Run the code (you just have to change the location and name of your data):
On which day do you mostly watch YouTube?
At what time do you mostly watch YouTube?
The idea for this code and analysis was provided by Michael Scharkow, University of Mainz.
For inferential modeling, consider (Clemm Von Hohenberg et al., 2024)….
Questions? 🤔
Introduction to Data Donation - TU Ilmenau - Valerie Hase