Position: When Incentives Backfire, Data Stops Being Human

Authors: Sebastin Santy, Prasanta Bhattacharya, Manoel Horta Ribeiro, Kelsey R Allen, Sewoong Oh

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Position: When Incentives Backfire, Data Stops Being Human. Abstract Progress in AI has relied on human-generated data... We argue that this issue goes beyond the immediate challenge... We propose that rethinking data collection systems to align with contributors intrinsic motivations... In this paper, we analyze the current data requirements in machine learning and how existing data collection systems attempt to meet them... drawing on foundational theories and experiments in the social sciences, particularly psychology and economics.
Researcher Affiliation Academia 1University of Washington, USA 2Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Republic of Singapore 3Princeton University, USA 4University of British Columbia.
Pseudocode No The paper is a position paper and does not present any algorithms or pseudocode.
Open Source Code No The paper is a position paper and does not describe any methodology for which source code would be provided or made available.
Open Datasets No The paper discusses various existing datasets (e.g., ImageNet, Wikipedia, Reddit, Common Crawl) as examples of data sources, but it does not introduce a new dataset or use specific datasets for its own empirical evaluation. No concrete access information for a dataset used in this paper's research is provided.
Dataset Splits No The paper is theoretical and conceptual, and does not conduct experiments requiring dataset splits.
Hardware Specification No The paper is theoretical and conceptual, and does not describe any experiments that would require specific hardware specifications.
Software Dependencies No The paper is theoretical and conceptual, and does not describe any experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and conceptual, and does not describe any experiments that would involve hyperparameter values or training configurations.