dEBORA: Efficient Bilevel Optimization-based low-Rank Adaptation

Authors: Emanuele Zangrando, Sara Venturini, Francesco Rinaldi, Francesco Tudisco

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On top of a detailed theoretical analysis of the method, we provide different numerical experiments showcasing its effectiveness.Additionally, we conduct extensive numerical experiments across a range of benchmarks, including natural language understanding and generation tasks, to demonstrate the effectiveness of dEBORA. Our results show that dEBORA outperforms existing low-rank adaptation methods in both efficiency and performance, particularly in settings with stringent parameter budgets.Section 7 details the experimental setup and results.
Researcher Affiliation Collaboration 1Gran Sasso Science Institute, L Aquila, Italy 2MOBS Lab, Northeastern University, Boston, US 3Department of Mathematics, University of Padova, Padova, Italy 4School of Mathematics and Maxwell Institute, University of Edinburgh, UK 5Miniml.AI Ltd, UK
Pseudocode Yes Algorithm 1 dEBORA: Efficient Bilevel Optimization-based low-Rank Adaptation... Algorithm 2 Stochastic approximation of the hypergradient e G(s)
Open Source Code No The paper does not provide an explicit statement about releasing its own source code, nor does it include a direct link to a code repository for the methodology described. While it references 'peft' library, it's not the authors' own implementation code.
Open Datasets Yes In our first experiment, we fine-tuned De BERTa V3 (He et al., 2023) on the GLUE benchmark (Wang et al., 2019)... We tested Res Net50 (He et al., 2015) on CIFAR-10 (Krizhevsky & Hinton, 2009) and Stable Diffusion (Rombach et al., 2021)...
Dataset Splits No To create the two loss functions f1, f2, we randomly partitioned the dataset into equally sized subsets, using one partition for the upper-level loss and the other for the lower-level loss. The paper describes an internal splitting strategy for the bilevel optimization losses (f1 and f2), but it does not specify the conventional training, validation, and test splits for the datasets (GLUE, CIFAR-10, Stable Diffusion) used in the experiments to allow for reproduction.
Hardware Specification Yes All experiments were run on a 80GB NVIDIA A100 GPU.
Software Dependencies No The paper mentions several tools and models like DeBERTa V3 and PEFT, but does not specify exact version numbers for any software dependencies (e.g., Python, PyTorch, CUDA, or specific library versions).
Experiment Setup Yes All methods used the same training settings: a constant learning rate of 5 10 1, weight decay of 1 10 3, Lo RA α = 32, and no dropout.