Session 3️⃣: Data Donation Studies (Researcher Perspective)
👉 Part of the SPP DFG Project Integrating Data Donations in Survey Infrastructure
Please come up with 2-3 research questions/hypotheses you may want to answer using data donation 🤔
To answer these, which methodological decisions would you have to take? 🤔
Research design & tool set-up
Data cleaning & augmentation, including
📢 Task 4: Classify search terms
Modelling
📢 Task 5: Example Analysis of YouTube Watch history
Image by Hope House Press via Unsplash
Source: Image by Markus Winkler via Unsplash

Empirical studies on data donation focus on …
Research designs include …
Research designs include …
⚠️ Causal inference remains a key problem!
⚠️ Match between theoretical concepts and measurements remains a key problem!
Key questions:
Key questions:
Choose a tool, e.g., …
Choose a tool, e.g., …
Relevant questions include…
Key questions:
Key questions:
You can find the following Python code for data extraction here:
Key decisions include:
Key questions:
Please look at your data and discuss: What needs to be anonymized? How could we do this? 🤔
Good anonymization may require…
Figure. Exampe whitelist
Figure. Example anonymized data
Let’s have a look at the technical set-up 💻:
Figure. Next setup
Figure. Next setup
Figure. Next setup
Figure. Next setup
Figure. Next setup
Figure. Next setup
Figure. Next setup
For this research question, what are (dis-)advantages of each sample? 🤔
(think about characteristics of the sample, response rates, representativeness, etc.)
Low response rates (e.g., Hase & Haim, 2024; Keusch et al., 2024)
Figure. Data donation study - researcher perspective
Figure. Data donation study - researcher perspective
This is how your data may look like:
Figure. Donated data - example
This is how your data may look like:
Figure. Donated data - example
Often, we need to further preprocess collected data through…
📢 Task 4: Classify search terms
Download the “Data for Task 4” from the website. It contains YouTube searches from a German social media sample. Either discuss this conceptually or try this in R/Python…..
How you would clean the data?
How you would identify health-related searches using manual or automated coding?
Figure. Donated data - example
Figure. Data donation study - researcher perspective
Figure. Data donation study - researcher perspective
👉 You know the drill: We will talk about this in session 4️⃣.
Figure. Data donation study - researcher perspective
For inferential modeling, consider (Clemm Von Hohenberg et al., 2024)….
📢 Task 5: Example Analysis of YouTube Watch history
Download the “Data for Task 5” from the website or use your own YouTube watch history. Also, load the respective R-code. Run the code (you just have to change the location and name of your data):
On which day do you mostly watch YouTube?
At what time do you mostly watch YouTube?
The idea for this code and analysis was provided by Michael Scharkow, University of Mainz.
Questions? 🤔
Data Donation Studies - DGPuK RezFo - Valerie Hase