LoRA-Gen: Specializing Large Language Model via Online LoRA Generation
Authors: Yicheng Xiao, Lin Song, Rui Yang, Cheng Cheng, Yixiao Ge, Xiu Li, Ying Shan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to validate the effectiveness of Lo RA-Gen on various commonsense reasoning tasks as well as an agent benchmark. The results demonstrate that our method balances both performance and efficiency, showing significant advantages across eight language datasets. For the edge-side model of Tiny LLa MA-1.1B, Lo RA-Gen outperforms vanilla Lo RA fine-tuning by a remarkable margin with only 16% sequence length, +1.3% on harmonic-mean of accuracy, and 2.1x speedup. |
| Researcher Affiliation | Collaboration | 1Tsinghua University 2ARC Lab, Tencent PCG 3The University of Hong Kong 4Xi an Jiao Tong University. Correspondence to: Xiu Li <EMAIL>. |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations (e.g., equations 1-8 in Section 3.1 and 3.2), but does not include a distinct pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Following (Dou et al., 2024; Li et al., 2024a), we select eight widely-used benchmarks to assess the reasoning ability of Lo RA-Gen across various knowledge domains ranging from natural science to daily life. One classification task: Bool Q (Clark et al., 2019). Five question-answering tasks: ARC-c (Clark et al., 2018), ARC-e (Clark et al., 2018), Open Book QA (Mihaylov et al., 2018), PIQA (Bisk et al., 2020) and Social QA (Sap et al., 2019). One science completion task: Hellaswag (Zellers et al., 2019) and a fill-in-the-blank task: Winogrande (Sakaguchi et al., 2020). We utilize the GPT4Tools (Yang et al., 2024a) which provides a benchmark to evaluate the ability of LLM to use tools... |
| Dataset Splits | Yes | We divide eight commonly used datasets into two parts, one as the multi-task learning set, including ARC-c, ARC-e, Open Book QA, Bool Q, Social QA and the other as an unseen test set, including Hellaswag, Winogrande and PIQA. We randomly sample to construct multi-shot training data. ... Table 12 outlines the data scale for each reasoning task. Method ARC-c ARC-e OBQA Bool Q SIQA Hella S Wino G PIQA Train 1120 2250 4957 9427 33410 39905 9248 16100 Test 1171 2380 500 3270 1954 10042 1267 1838 |
| Hardware Specification | Yes | All the latencies are measured on the same GPU with 40GB of memory. ... The models are trained with eight NPUs (64GB memory per device) by default. ... Latency is measured on a Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions the use of an optimizer (Adam W) and a project (lm-evaluation-harness) but does not provide specific version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We deploy LLa MA3-8B (Grattafiori et al., 2024) as the cloud-side LM during online task-specific Lo RA parameters generation. We finetune the q and v projection layers of the LLM with a Lo RA adapter. The number of experts is 8 and we set K in the routing function TOP-K to 2 by default. The coefficient α for auxiliary loss Lcv is set 0.01. ... The models are trained with eight NPUs (64GB memory per device) by default. We set betas and momentum of the Adam W optimizer with (0.9, 0.999) and 0.9, respectively. During training, we utilize a Cosine Scheduler with an initial learning rate of 2e-5 and weight decay of 0.1. The details are shown in Table 10. Hyper-parameters Lo RA-Gen optimizer Adam W learning rate 2e-5 warm steps 50 weight decay 0.1 optimizer momentum β1, β2=0.9, 0.999 batch size 64 epoch 4 max length 2048 Lo RA attention dimension (r) 16 Lo RA scaling alpha (α) 16 Lo RA drop out 0.05 |