L3Ms — Lagrange Large Language Models

Authors: Guneet Singh Dhillon, Xingjian Shi, Yee Whye Teh, Alex Smola

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate the versatility and efficacy of L3Ms in achieving tailored alignments for various applications. 6 EXPERIMENTAL RESULTS
Researcher Affiliation Collaboration Guneet S. Dhillon 1 , Xingjian Shi 2, Yee Whye Teh 1, Alex Smola 2 1 University of Oxford, 2 Boson AI
Pseudocode No No explicit pseudocode or algorithm blocks are present in the paper. The derivations in Section 5.1 and Appendix B are mathematical, not algorithmic.
Open Source Code Yes Our code, based on the Transformers library (Wolf et al., 2020), is available at: https://github.com/Guneet-Dhillon/l3m.
Open Datasets Yes We use Ultra Chat (Ding et al., 2023), a large-scale dataset of instructional conversations, as our task data to induce instruction-following capabilities. We use the Helpful and Harmless (Bai et al., 2022) preference data to learn two reward models, respectively.
Dataset Splits Yes Consequently, we obtain 340k training samples, 1.7k validation samples, and 1.7 test samples, split randomly since the dataset does not contain train-val-test splits.
Hardware Specification Yes We run all experiments on NVIDIA H100s.
Software Dependencies No The paper mentions the 'Transformers library (Wolf et al., 2020)' but does not provide a specific version number for it or any other software dependencies like Python or PyTorch.
Experiment Setup Yes We fine-tune LLMs for 1 epoch on the task data, with a mini-batch size of 64. We use Adam with a learning rate of 10-6 and a cosine learning rate scheduler (with 5% of the epoch used for warmup). We set weight decay to 0.1 and the gradient clipping maximum norm to 1. We utilize 16-bit (mixed) precision training and gradient checkpointing. We exponentially decay the log-barrier parameter µ during fine-tuning from 1 to 10-6 and use a smoothing factor of 0.1 for the exponential moving average. Lastly, we use top-p sampling (p set to 0.9) for response generation.