reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PAD: Personalized Alignment of LLMs at Decoding-time

Authors: Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, Zuozhu Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that PAD not only outperforms existing training-based alignment methods in terms of aligning with diverse preferences but also shows significant generalizability to preferences unseen during training and scalability across different base models.
Researcher Affiliation	Academia	1 Zhejiang University 2 National University of Singapore 3 University of Washington
Pseudocode	Yes	To better illustrate the practical implementation of our PAD as discussed in Section 3.4, which comprises two key components: the optimization of the Personalized Reward Model (PRM) and the inference-time guided decoding with token-level personalized rewards, we have detailed these processes in Algorithms 1 and 2.
Open Source Code	Yes	Our model and code are available here. All code and models will be made available for reproducibility and further research.
Open Datasets	Yes	During the development of our personalized reward model, we utilized datasets from multiple sources including Help Steer2 (Wang et al., 2024c), Rewards-in-Context (Yang et al., 2024b), and Safe RLHF (Dai et al., 2023). The P-Soups (Jang et al., 2023) evaluation dataset has been filtered and modified based on the Koala evaluation by Jang et al. (2023). The Help Steer2 (Wang et al., 2024c) (validation split) dataset is a multi-aspect alignment dataset comprising 1,000 prompts.
Dataset Splits	No	The P-Soups (Jang et al., 2023) evaluation dataset has been filtered and modified based on the Koala evaluation by Jang et al. (2023). The Help Steer2 (Wang et al., 2024c) (validation split) dataset is a multi-aspect alignment dataset comprising 1,000 prompts. In the stage of personalized reward model training, we utilize training data from three datasets Help Steer2 (Wang et al., 2024c), Rewards-in-Context (Yang et al., 2024b), and Safe RLHF (Dai et al., 2023). For the Ultrafeedback and Help Steer2 datasets, we build data pairs by comparing the score annotations within the datasets.
Hardware Specification	Yes	The training was executed on 4 NVIDIA H100 80GB GPUs, with per-device batch size of 4. The time costs for decoding-time alignment, with results detailed in Table C2, which is measured on a single NVIDIA H100 GPU.
Software Dependencies	No	Our training code is based on Llama-Factory (Zheng et al., 2024). We performed model fine-tuning using the Lo RA method, specified to target all layers, utilizing a rank of 8.
Experiment Setup	Yes	We employ the Llama-3-8B model (AI@Meta, 2024) as our backbone, and append a linear layer directly following the embeddings, featuring an output dimension of 4096. During the decoding phase, we utilize greedy decoding with top-k candidates. We restrict the maximum lengths of the initial prompt and subsequent generations to 2,048 and 128 tokens, respectively. The hyperparameters, specifically β = 1.0 and k = 10, are optimized to maximize the average reward performance observed in our validation datasets. We performed model fine-tuning using the Lo RA method, specified to target all layers, utilizing a rank of 8. The training was executed on 4 NVIDIA H100 80GB GPUs, with per-device batch size of 4. To accommodate larger effective batch sizes, we employed 8 gradient accumulation steps. The learning rate was set at 5.0e-6, and the model was trained over 3 epochs using a cosine learning rate scheduler.