Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective
Authors: Yunzhen Yao, Lie He, Michael Gastpar
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic data and LLM alignment data validate our theoretical findings, showing that sparsity-aware methods significantly reduce sample complexity and improve prediction accuracy. Our experimental evaluations demonstrate that sparsity-aware estimators outperform widely used baselines in reward modeling, evaluated on both synthetic datasets and LLM alignment datasets using popular language models. |
| Researcher Affiliation | Academia | 1LINX, EPFL, Lausanne, Switzerland 2Key Laboratory of Interdisciplinary Research of Computation and Economics (Shanghai University of Finance and Economics), Ministry of Education, China 3School of Computing and Artificial Intelligence, Shanghai University of Finance and Economics, Shanghai, China. |
| Pseudocode | No | The paper describes methods verbally and mathematically but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code can be found at this link: https://github.com/yaoyzh/Sparse Preference Learning |
| Open Datasets | Yes | We train reward models using the rm-static dataset (Bai et al., 2022)4 and SHP dataset (Ethayarajh et al., 2022)5. 4https://huggingface.co/datasets/Dahoas/ rm-static 5https://huggingface.co/datasets/ stanfordnlp/SHP |
| Dataset Splits | No | The paper mentions using datasets for training and evaluating test accuracy but does not provide specific details on how the datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or explicit standard splits). |
| Hardware Specification | No | The paper mentions the use of pretrained language models (e.g., Pythia-70M, Llama-3.2-1B) but does not specify any hardware details like GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper mentions using the Sci Py package (Virtanen et al., 2020) and that the code is based on Deepspeed-Chat (Yao et al., 2023), but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The learning rate is set to 10-5, and the weight decay is set to 0.1. The batch size is 8 for Pythia-70M, 16 for Llama-3.2-1B and 32 for Llama-3.2-3B, and the training runs for 1 epoch. The regularization hyperparameter β for the ℓ1-regularized method is selected from the range 10[-4.5:0.5:0] {2, 4, 8}. Each β value, including β = 0, is evaluated across 5 trials with random seeds in {0, 1, 2, 3, 4} for Pythia-70M and Llama-3.2-1B, and 3 trials with random seeds in {0, 1, 2} for Llama-3.2-3B. |