reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement

Authors: Hui Yuan, Yifan Zeng, Yue Wu, Huazheng Wang, Mengdi Wang, Liu Leqi

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical implications of our framework further extend to explaining important differences in the training dynamics of various preference optimization algorithms and suggesting future directions for improvement. 1... We validate these theoretical insights empirically (Section 4.2).
Researcher Affiliation	Academia	Hui Yuan 1, Yifan Zeng 2, Yue Wu 3, Huazheng Wang4, Mengdi Wang5, Liu Leqi 6 1,3,5Princeton University 2,4Oregon State University 6The University of Texas at Austin
Pseudocode	No	The paper describes theoretical derivations and empirical validations of existing methods but does not present any new algorithms in pseudocode or algorithm block format.
Open Source Code	Yes	Code for the paper can be found at https://github.com/Humain Lab/Understand Margin PO.
Open Datasets	Yes	We conduct experiments on the TL;DR dataset (Stiennon et al., 2020) to showcase the widely-existing phenomenon that the chosen and rejected log-probabilities have synchronized changes during preference optimization. In addition, Figure 1 depicts how different margin-based preference optimization algorithms influence the log-probability of chosen and rejected responses.
Dataset Splits	No	The paper mentions using the TL;DR dataset and a specially curated sentiment dataset, and that log-probabilities are averaged on the evaluation set. However, it does not provide specific percentages, sample counts, or detailed methodologies for how the datasets were split into training, validation, and test sets, which is needed for reproduction.
Hardware Specification	Yes	The training was performed on a hardware setup consisting of two NVIDIA H100 GPUs, providing substantial computational power for the training process.
Software Dependencies	Yes	Our experiments were implemented using TRL version 0.11.0.
Experiment Setup	Yes	To optimize the training process, we applied Low-Rank Adaptation (Lo RA) with a rank of 64 to both models. The learning rate was set at 5 10 6 for all RLHF training.