reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Position: When Incentives Backfire, Data Stops Being Human

Authors: Sebastin Santy, Prasanta Bhattacharya, Manoel Horta Ribeiro, Kelsey R Allen, Sewoong Oh

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Position: When Incentives Backfire, Data Stops Being Human. Abstract Progress in AI has relied on human-generated data... We argue that this issue goes beyond the immediate challenge... We propose that rethinking data collection systems to align with contributors intrinsic motivations... In this paper, we analyze the current data requirements in machine learning and how existing data collection systems attempt to meet them... drawing on foundational theories and experiments in the social sciences, particularly psychology and economics.
Researcher Affiliation	Academia	1University of Washington, USA 2Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, #16-16 Connexis, Singapore 138632, Republic of Singapore 3Princeton University, USA 4University of British Columbia.
Pseudocode	No	The paper is a position paper and does not present any algorithms or pseudocode.
Open Source Code	No	The paper is a position paper and does not describe any methodology for which source code would be provided or made available.
Open Datasets	No	The paper discusses various existing datasets (e.g., ImageNet, Wikipedia, Reddit, Common Crawl) as examples of data sources, but it does not introduce a new dataset or use specific datasets for its own empirical evaluation. No concrete access information for a dataset used in this paper's research is provided.
Dataset Splits	No	The paper is theoretical and conceptual, and does not conduct experiments requiring dataset splits.
Hardware Specification	No	The paper is theoretical and conceptual, and does not describe any experiments that would require specific hardware specifications.
Software Dependencies	No	The paper is theoretical and conceptual, and does not describe any experiments that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and conceptual, and does not describe any experiments that would involve hyperparameter values or training configurations.