Session 4️⃣: Bias & Outro
Frieder Rodewald (University of Mannheim) & Valerie Hase (LMU Munich)
👉 Part of the SPP DFG Project Integrating Data Donations in Survey Infrastructure
Bias in Data Donation Studies
What’s Next for Data Donation?
Outro
Image by Hope House Press via Unsplash
Source: Image by Markus Winkler via Unsplash
Definition 💡: Deviations from the true value of a theoretical concept introduced by its measurement (Peytchev, 2013)
👉 Bias can influence descriptive results but also attenuate/inflate inferential conclusions.
Source: Image from Boeschoten et al., 2022, p. 396
For example …
Coverage error: Who is (not) represented in the sampling frame? (e.g., social media users vs. YouTube users)
Sampling error: Who is (not) represented in the sample? (e.g., non-probability samples)
Non-response error: Who does (not) want to participate in the data donation?
Compliance error: Who is (not) able to participate in the data donation?
What do you think: Which participant characteristics may correlate with non-response or non-compliance? 🤔
Example study by Hase & Haim (2024):
Source: Figure from Hase & Haim (2024)
Any ideas (from your discipline): How can we quantify/address errors in representation? 🤔
A posteriori strategies:
Infrastructure: Integration in probability-based panels
Survey design strategies (e.g., incentives, study framing)
DDT design (e.g. UX-perspective)
Post hoc strategies:
For now: limited studies, limited success of existing solutions
Source: Figure from Hase & Haim (2024)
What do you think: How could errors in measurements sneak into data donation studies? 🤔
For example …
Construct (in-)validity: How do DDP variables relate to latent measurements? (e.g., likes vs. political participation)
Measurement error: How correct is data in our DDP? (e.g., missing data)
Extraction error: Did we extract all relevant files and variables?
Example study by Valkenburg et al. (2024):
Source: Figure from Valkenburg et al. (2024)
Any ideas (from your discipline): How can we quantify/address errors in measurements? 🤔
A posteriori strategies:
Talk to everyone (e.g., IRB, Data Strward)
Repeated testing & DDP download
Simulate downstream errors (Bosch et al., 2024)
Post hoc strategies:
Multiverse approaches
Statistical error correction (TeBlunthuis et al., 2024)
Error documentation (Gebru et al., 2021)
Example study by Hase et al. (2024):
Source: Figure from Hase et al. 2024
Questions? 🤔
Source: Image by Markus Winkler via Unsplash
Multimodal & cross-platform data 📸 (Wedel et al., 2024)
In-tool, local classification (e.g., local SML/LLMs?)
Workflow/UX-perspective
Source: Image by DariuszSankowski via Pixabay
Platforms do (willingly?) not provide data according to the GDPR/DSA (Hase et al., 2024)
The EU has started to sanction platforms like X/TikTok
DSA may become the subject of larger geo-political debates with the USA (Seiling et al., 2025)
Source: Image by WilliamCho via Pixabay
Can the method actually be applied for empirical research? (few examples, like Thorson et al., 2021; Wojcieszak et al., 2024)
Requires interdisciplinary perspectives (e.g., addressing bias, integration in probability-based panels)
Source: Image by Vladislav Babienko via Pixabay
Questions? 🤔
Source: Image by Markus Winkler via Unsplash
👉 Please fill out this 3-minute feedback form: https://forms.gle/KLMweywhW7odGyfk8
QR code for survey
Thanks for joining the workshop 🙌
Data Donation Studies - COMPTEXT - Frieder Rodewald, Valerie Hase