reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Position: Challenges and Future Directions of Data-Centric AI Alignment

Authors: Min-Hsuan Yeh, Jeffrey Wang, Xuefeng Du, Seongheon Park, Leitian Tao, Shawn Im, Yixuan Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this position paper, we highlight key challenges associated with both human-based and AI-based feedback within the data-centric alignment framework. Through qualitative analysis, we identify multiple sources of unreliability in human feedback... We conduct an in-depth qualitative study using a subset of data from the popular Anthropic-HH dataset (Bai et al., 2022a)...
Researcher Affiliation	Academia	1Department of Computer Science, University of Wisconsin Madison, WI, USA. Correspondence to: Min-Hsuan Yeh <EMAIL>, Yixuan Li <EMAIL>.
Pseudocode	No	The paper describes a qualitative analysis and proposes future research directions, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	Yes	We conduct an in-depth qualitative study using a subset of data from the popular Anthropic-HH dataset (Bai et al., 2022a), where each question is paired with two responses: chosen and rejected by humans.
Dataset Splits	No	We randomly sample 80 data points from both harmless split and helpful split of Anthropic-HH dataset, and hire three annotators to re-label these 160 samples and record their thoughts and criteria during the annotation process. The paper describes how a subset was sampled for their qualitative analysis, but it does not provide specific training/test/validation splits for machine learning experiments.
Hardware Specification	No	The paper describes a qualitative analysis and proposes future research directions; it does not report on computational experiments that would require specific hardware specifications.
Software Dependencies	No	The paper discusses concepts and qualitative analysis, not the implementation of a software system with specific dependencies and version numbers.
Experiment Setup	No	The paper describes a qualitative study and its annotation setup in Appendix A, but it does not include details such as hyperparameters, optimizer settings, or system-level training configurations, as it does not involve training models.