Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Authors: Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments tell that Lo SA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden. For example, Lo SA reduced the perplexity of sparse LLa MA-2-7B by 68.73 and increased zero-shot accuracy by 16.32% , achieving a 2.60 speedup on CPU and 2.23 speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU. |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. 2 Huawei Technologies. 3Institute of Artificial Intelligence, Xiamen University. 4 Peng Cheng Laboratory, Shenzhen, China. |
| Pseudocode | Yes | Algorithm 1: Dynamic Low-rank Sparse Adaptation (Lo SA) |
| Open Source Code | Yes | Code is available at https://github.com/wzhuang-xmu/Lo SA. |
| Open Datasets | Yes | We report perplexity of sparse LLM on Wiki Text-2 (Merity et al., 2016) dataset and use lm-eval-harness (Gao et al., 2021) to evaluate the zero-shot accuracy on downstream datasets, including Hella Swag (Zellers et al., 2019), Winogrande (Sakaguchi et al., 2021), Bool Q (Clark et al., 2019), Open Book QA (Mihaylov et al., 2018), PIQA (Bisk et al., 2020), ARC-Easy, and ARC-Challenge (Clark et al., 2018). |
| Dataset Splits | No | The paper mentions using a "10K subset from the Alpaca-GPT4 (Peng et al., 2023) to construct our fine-tuning dataset" and "128 sequences sampled from the C4 training set (Raffel et al., 2020) for sparsification". While standard benchmark datasets are used for evaluation, no explicit train/test/validation splits (percentages, counts, or references to specific split configurations) are provided for any of the datasets used for fine-tuning or evaluation. |
| Hardware Specification | Yes | For example, Lo SA reduced the perplexity of sparse LLa MA-2-7B by 68.73 and increased zero-shot accuracy by 16.32% , achieving a 2.60 speedup on CPU and 2.23 speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU. ... All experiments were conducted on NVIDIA A100 80GB GPUs. ... We measured the end-to-end time of the model generate tokens using the Deep Sparse (Neural Magic, 2021) inference engine on an Intel(R) Xeon(R) Silver 4314 CPU and the nm-vllm (Neural Magic, 2024) inference engine on a NVIDIA RTX 4090 24GB GPU. |
| Software Dependencies | No | The paper mentions using 'Paged Adam W optimizer', 'Deep Sparse inference engine', 'nm-vllm inference engine', and 'lm-eval-harness' but does not specify version numbers for these software components or any other libraries/frameworks. |
| Experiment Setup | Yes | During the fine-tuning process, we employed the Paged Adam W optimizer (Dettmers et al., 2024), setting a maximum gradient norm of 0.3. The learning rate followed a linear learning rate schedule and set the learning rate to be 2 10 4. ... We set the fine-tuning steps T = 5 and initial average rank Ω1 = 6. |