reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

Authors: Chengqian Gao, Haonan Li, Liu Liu, Zeke Xie, Peilin Zhao, Zhiqiang Xu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through systematic experimentation, we validate this principle with three key findings... Building on this principle, we introduce Selective DPO, which filters out overly difficult examples. This simple adjustment improves alignment performance by 9-16% in win rates on the Alpaca Eval 2 benchmark compared to the DPO baseline, surpassing a series of DPO variants with different algorithmic adjustments. These results together illuminate the importance of aligning data difficulty with model capacity, offering a transformative perspective for improving alignment strategies in LLMs. Code is available at https://github.com/glorg ao/Selective DPO
Researcher Affiliation	Collaboration	1MBZUAI 2Tencent Inc 3HKUST (Guangzhou) 4SJTU. Correspondence to: Liu Liu <EMAIL>, Peilin Zhao <EMAIL>, Zhiqiang Xu <EMAIL>.
Pseudocode	Yes	A. Pseudocode for the Instantiated Algorithm: Selective DPO. Algorithm 1 Selective DPO
Open Source Code	Yes	Code is available at https://github.com/glorg ao/Selective DPO
Open Datasets	Yes	We use Ultra Feedback-binarized2,where darker colors indicate more training steps needed for model comprehension. Results from 10 runs show consistent learning order across different models (Jiang et al., 2023; AI@Meta, 2024; Team et al., 2024) varying in size (2B 9B), training stage, and data sampling. This consistency confirms that examples vary in difficulty, allowing us to discuss difficult examples without debating various definitions of difficulty. 2https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized. We use Ultra Feedback-binarized, a widely adopted alignment dataset (Tunstall et al., 2023; Meng et al., 2024; Zhou et al., 2024; Pattnaik et al., 2024), and Argilladpo-mix-7k3, a small but high-quality dataset. 3https://huggingface.co/datasets/argilla/ dpo-mix-7k
Dataset Splits	Yes	To compute the validation loss, we partition D equally into ˆD and D \ ˆD, train on one partition, evaluate on the other, and finally output average results over three runs. ... The training dataset is randomly split into two partitions. ... The easiest examples, comprising the lowest τ percent of validation losses, are selected for alignment training. ... For the evaluation in the next section, we set τ = 50 for the Ultra Feedback-binarized dataset, based on insights from Figure 3.
Hardware Specification	Yes	All training experiments in this paper were conducted on compute nodes equipped with 8 H100 GPUs. To facilitate reproduction with limited computational resources, we also provide key benchmarking results for selected models trained using 4 A100 40G GPUs with Lo RA.
Software Dependencies	No	The paper does not provide specific software names with version numbers, such as Python or specific library versions.
Experiment Setup	Yes	Following prior work, we set β = 0.01 (Zhou et al., 2024). The learning rate is swept for DPO with random ordering and directly applied to DPO with other settings. We conduct the alignment with one epoch following Meng et al. (2024). ... Appendix C.2 SFT Hyper-Parameters and C.3 Key Hyper-Parameters for Alignment provide detailed tables (Table 2, 3, 4, 5, 6) listing Batch Size, Learning Rate, Epoch, and Optimizer for various models and datasets. For example, Table 2 lists 'Batch Size 128', 'Learning Rate 2e-5', 'Epoch 1', 'Optimizer Adam'. Appendix C.4 Lo RA Configuration for Alignment details 'lora alpha 16', 'lora dropout 0.05', 'lora target modules q proj,k proj,v proj,o proj,gate proj,up proj,down proj'.