Position: When Incentives Backfire, Data Stops Being Human
Authors: Sebastin Santy, Prasanta Bhattacharya, Manoel Horta Ribeiro, Kelsey R Allen, Sewoong Oh
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Position: When Incentives Backfire, Data Stops Being Human. Abstract Progress in AI has relied on human-generated data... We argue that this issue goes beyond the immediate challenge... We propose that rethinking data collection systems to align with contributors intrinsic motivations... In this paper, we analyze the current data requirements in machine learning and how existing data collection systems attempt to meet them... drawing on foundational theories and experiments in the social sciences, particularly psychology and economics. |
| Researcher Affiliation | Academia | 1University of Washington, USA 2Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Republic of Singapore 3Princeton University, USA 4University of British Columbia. |
| Pseudocode | No | The paper is a position paper and does not present any algorithms or pseudocode. |
| Open Source Code | No | The paper is a position paper and does not describe any methodology for which source code would be provided or made available. |
| Open Datasets | No | The paper discusses various existing datasets (e.g., ImageNet, Wikipedia, Reddit, Common Crawl) as examples of data sources, but it does not introduce a new dataset or use specific datasets for its own empirical evaluation. No concrete access information for a dataset used in this paper's research is provided. |
| Dataset Splits | No | The paper is theoretical and conceptual, and does not conduct experiments requiring dataset splits. |
| Hardware Specification | No | The paper is theoretical and conceptual, and does not describe any experiments that would require specific hardware specifications. |
| Software Dependencies | No | The paper is theoretical and conceptual, and does not describe any experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and conceptual, and does not describe any experiments that would involve hyperparameter values or training configurations. |