reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Authors: Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments across natural language understanding, dialogue generation, mathematical reasoning, code generation, and image classification tasks, demonstrating that Lo RA-Pro substantially improves Lo RA s performance, effectively narrowing the gap with full fine-tuning. Our code is publicly available at https://github.com/mrflogs/Lo RA-Pro.
Researcher Affiliation	Academia	Zhengbo Wang1,2 Jian Liang2,3 Ran He2,3 Zilei Wang1 Tieniu Tan2,4 1 University of Science and Technology of China 2 NLPR & MAIS, Institute of Automation, Chinese Academy of Sciences 3 School of Artificial Intelligence, University of Chinese Academy of Sciences 4 Nanjing University EMAIL, EMAIL
Pseudocode	Yes	C OPTIMIZATION ALGORITHMS In this section, we present the pseudo-codes for implementing our Lo RA-Pro method using the SGD (Sutskever et al., 2013) and Adam W (Loshchilov & Hutter, 2019) optimizers. The details are provided in Algorithm 1 and Algorithm 2, respectively.
Open Source Code	Yes	Our code is publicly available at https://github.com/mrflogs/Lo RA-Pro.
Open Datasets	Yes	First, we assess natural language understanding capabilities using the GLUE benchmark by fine-tuning the T5-base (Raffel et al., 2020) model in Section 3.1. Next, we evaluate its capabilities in dialogue generation, mathematical reasoning, and code generation using the Llama-2-7B model (Touvron et al., 2023), covered in Section 3.2. We then examine Lo RA-Pro s effectiveness on image classification tasks using the CLIP-Vi T-B/16 (Radford et al., 2021) model in Section 3.3.
Dataset Splits	Yes	For the dialogue generation task, we fine-tune the Llama-2-7B (Touvron et al., 2023) model on a 52k subset of the Wizard LM dataset (Xu et al., 2024) and evaluate it using the MTBench dataset (Zheng et al., 2024a). For the math task, we fine-tune the Llama-2-7B (Touvron et al., 2023) model on a 100k sample from the Meta Math QA dataset (Yu et al., 2024). The model is then evaluated on the GSM8K test set (Cobbe et al., 2021), and we report the accuracy as the metric. For the coding task, we fine-tune the Llama-2-7B (Touvron et al., 2023) model on a 100k subset of the Code Feedback dataset (Zheng et al., 2024b) and test it on the Human Eval dataset (Chen et al., 2021), reporting the PASS@1 metric.
Hardware Specification	Yes	All experiments are conducted on NVIDIA RTX A6000 GPUs. Memory cost is measured using a single A6000 GPU with a batch size of 1. Training time is recorded on the Meta Math QA dataset using 8 A100 GPUs with Deep Speed Ze RO-2 stage optimization.
Software Dependencies	No	To ensure a fair comparison, we align our experimental setup with that of Lo RA-GA (Wang et al., 2024a). By default, we fine-tune the model using the Adam W optimizer (Loshchilov & Hutter, 2019) with hyper-parameters β1 = 0.9, β2 = 0.999, and weight decay set to 0. We implement a cosine learning rate schedule with a warmup ratio of 0.03. Lo RA is applied to all linear modules, excluding the embedding layer, normalization layer, and classification head.
Experiment Setup	Yes	Training details. To ensure a fair comparison, we align our experimental setup with that of Lo RA-GA (Wang et al., 2024a). By default, we fine-tune the model using the Adam W optimizer (Loshchilov & Hutter, 2019) with hyper-parameters β1 = 0.9, β2 = 0.999, and weight decay set to 0. We implement a cosine learning rate schedule with a warmup ratio of 0.03. Lo RA is applied to all linear modules, excluding the embedding layer, normalization layer, and classification head. By default, we set the rank r = 8 and α = 16. For natural language understanding tasks, we fine-tune T5-base (Raffel et al., 2020) model with learning rate 1e-4. The sequence length is set to 128, and the training batch size is 32.