Decoupling Angles and Strength in Low-rank Adaptation
Authors: Massimo Bini, Leander Girrbach, Zeynep Akata
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our proposed De Lo RA method for image generation, natural language understanding, and instruction tuning tasks. We begin by providing a detailed description of these tasks and their relevance. To justify our design choices, we present a comprehensive ablation study that highlights the key innovations of De Lo RA. Finally, we demonstrate that De Lo RA not only matches or exceeds the performance of Lo RA and other state-of-the-art methods but also exhibits superior robustness. |
| Researcher Affiliation | Academia | 1University of T ubingen, T ubingen AI Center, 2Helmholtz Munich, 3Technical University of Munich, Munich Center for Machine Learning, MDSI |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual explanations but does not contain explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Explainable ML/De Lo RA. |
| Open Datasets | Yes | The dataset, sourced from (Ruiz et al., 2023), comprises 30 subjects... For training and evaluation, we utilize semantic maps and images from the ADE20K dataset (Zhou et al., 2019)... on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2018)... finetuning LLa MA-2-7B (Touvron et al., 2023b) on the Alpaca dataset (Taori et al., 2023). |
| Dataset Splits | Yes | Following Wu et al. (2024c), for each benchmark task, we split the publicly available validation set in two subsets as reported in Table 7. When validation sets are larger than 2K, a 1K subset is used as new validation set, and the remaining as test set, otherwise the validation is split in two equally sized subsets. We use the new validation set to tune the hyperparameters on seed 42. Then, best hyperparameters are used to evaluate test performance for seeds 42, 43, 44, 45, 46. |
| Hardware Specification | Yes | The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer JUWELS (Alvarez, 2021) at J ulich Supercomputing Centre (JSC). |
| Software Dependencies | No | The paper mentions various models and tasks like Stable Diffusion, RoBERTa-base, and LLaMA-2, and uses 'bfloat16 precision', but does not provide specific version numbers for any software libraries, frameworks, or dependencies used for implementation or experimentation. |
| Experiment Setup | Yes | For Lo RA and Do RA we followed best practices and fixed lambda to twice the rank during hyperparemeter search. Optimal learning rate for both methods is 6e-4. For De Lo RA we fixed the λ scaling parameter to 1e-3, and found an optimal learning rate of 2e-2 for the BA matrices. ... For larger datasets (MNLI, SST-2, QNLI, QQP) we fix the λ scaling learning rate to 3e-3, while for smaller datasets we fix it to 1e-2. For other hyperparameters we run a small grid search. Best values are reported in Table 9. |