Session 4️⃣: Bias in Digital Trace Data & Outro
👉 Part of the SPP DFG Project Integrating Data Donations in Survey Infrastructure
Bias in Data Donation Studies
What’s Next for Data Donation?
Summary & Evaluation
Image by Hope House Press via Unsplash
Source: Image by Markus Winkler via Unsplash
Definition💡: the systematic difference between a true value of a quantity for a population and how a study observe its (Hase et al., in press)
In CSS, the bias-variance tradeoff plays an important role: Often, we can either improve reduce variance or bias for models.
Bias–variance tradeoff
Source: Scott Fortmann-Roe (2012)
In CSS, the bias-variance tradeoff plays an important role: Often, we can either improve reduce variance or bias for models.
Please think about bias in and through CSS 🤔
Bias is an underestimated problem in CSS (Hase et al., 2025; Kathirgamalingam, Kulichkina, et al., 2025)

Source: The Guardian, 2024

Source: The Verge, 2024
Bias is an underestimated problem in CSS (Hase et al., 2025; Kathirgamalingam, Kulichkina, et al., 2025)
Special Issue in Communication Methods and Measures
Source: Image from Boeschoten et al., 2022, p. 396
For example …
Coverage error: Who is (not) represented in the sampling frame? (e.g., social media users vs. YouTube users)
Sampling error: Who is (not) represented in the sample? (e.g., non-probability samples)
Non-response error: Who does (not) want to participate in the data donation?
Compliance error: Who is (not) able to participate in the data donation?
Which aspects of the research design or participant characteristics may correlate with participants dropping out of data donation studies? 🤔
Example study by Hase & Haim (2024):
Source: Figure from Hase & Haim (2024)
Literature review by Xiong et al. (2025) and own experiences
Research design
Participant characteristics
Any ideas (from your discipline): How can we quantify/address errors in representation? 🤔
Methods for bias detection often draw from validation strategies, though this may not be enough (Hase et al., 2025)
👉 “a more pragmatic vision of bias detection: one that abandons the pursuit of perfect benchmarks in favor of comparative assessments of biases across CSS and non-CSS methods.” (Hase et al., 2025, p. 5)
A posteriori strategies:
Infrastructure: Integration in probability-based panels
Learning from survey design strategies (e.g., incentives, study framing) (Hase & Haim, 2024)
DDT design (e.g. UX-perspective)
Post hoc strategies:
For now: limited studies, limited success
Source: Figure from Hase & Haim (2024)
What do you think: How could errors in measurements sneak into data donation studies? 🤔
For example …
Construct (in-)validity: How do DDP variables relate to latent measurements? (e.g., likes vs. political participation)
Measurement error: How correct is data in our DDP? (e.g., missing data)
Extraction error: Did we extract all relevant files and variables?
Example study by Valkenburg et al. (2024):
Source: Figure from Valkenburg et al. (2024)
Any ideas (from your discipline): How can we quantify/address errors in measurements? 🤔
A posteriori strategies:
Talk to everyone (e.g., IRB, Data Strward)
Repeated testing & DDP download
Simulate downstream errors (Bosch et al., 2024)
Post hoc strategies:
Multiverse approaches
Statistical error correction (TeBlunthuis et al., 2024)
Error documentation (Gebru et al., 2021)
In a recent policy paper, around 20 scholars from different CSS labs argued (Hase et al., 2024):
Source: Figure from Hase et al. 2024
Despite my lengthy rant about bias, this is not a statement against data donations.
Just be sure to:
Questions? 🤔
Source: Image by Markus Winkler via Unsplash
Preregistration:
👉 Our recent preregistration includes 70 pages 😭 and we fully simulated results to understand potential decision trees
Preregistration:
Figure. Github issues - Testing the tool
Open Data:
Open Materials:
Multimodal & cross-platform data 📸 (Wedel et al., 2025)
Less standardized data (e.g., chatbot or message logs)
In-tool, local classification (e.g., local SML/LLMs?)
Workflow/UX-perspective
Source: Image by DariuszSankowski via Pixabay
Platforms do (willingly?) not provide data according to the GDPR/DSA (Hase et al., 2024)
The EU may sanction platforms like X/TikTok, but exact sanctions remain unclear (see DSA Observatory)
DSA subject to larger geo-political debates (Seiling et al., 2025), where some politicians falsely claim “censorship” as the reason behind regulations
Recent GDPR Omnibus amendment would make data donation unfeasible (see our open letter to the European Commission)
Source: Image by WilliamCho via Pixabay
Questions? 🤔
Source: Image by Markus Winkler via Unsplash
👉 Please fill out this 3-minute feedback form: https://forms.gle/xaRy2Ldr9mU9jGc3A
Thanks for joining the workshop 🙌
Data Donation Studies - DGPuK RezFo - Valerie Hase