reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dynamic Low-Rank Sparse Adaptation for Large Language Models

Authors: Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments tell that Lo SA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden. For example, Lo SA reduced the perplexity of sparse LLa MA-2-7B by 68.73 and increased zero-shot accuracy by 16.32% , achieving a 2.60 speedup on CPU and 2.23 speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU.
Researcher Affiliation	Collaboration	1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. 2 Huawei Technologies. 3Institute of Artificial Intelligence, Xiamen University. 4 Peng Cheng Laboratory, Shenzhen, China.
Pseudocode	Yes	Algorithm 1: Dynamic Low-rank Sparse Adaptation (Lo SA)
Open Source Code	Yes	Code is available at https://github.com/wzhuang-xmu/Lo SA.
Open Datasets	Yes	We report perplexity of sparse LLM on Wiki Text-2 (Merity et al., 2016) dataset and use lm-eval-harness (Gao et al., 2021) to evaluate the zero-shot accuracy on downstream datasets, including Hella Swag (Zellers et al., 2019), Winogrande (Sakaguchi et al., 2021), Bool Q (Clark et al., 2019), Open Book QA (Mihaylov et al., 2018), PIQA (Bisk et al., 2020), ARC-Easy, and ARC-Challenge (Clark et al., 2018).
Dataset Splits	No	The paper mentions using a "10K subset from the Alpaca-GPT4 (Peng et al., 2023) to construct our fine-tuning dataset" and "128 sequences sampled from the C4 training set (Raffel et al., 2020) for sparsification". While standard benchmark datasets are used for evaluation, no explicit train/test/validation splits (percentages, counts, or references to specific split configurations) are provided for any of the datasets used for fine-tuning or evaluation.
Hardware Specification	Yes	For example, Lo SA reduced the perplexity of sparse LLa MA-2-7B by 68.73 and increased zero-shot accuracy by 16.32% , achieving a 2.60 speedup on CPU and 2.23 speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU. ... All experiments were conducted on NVIDIA A100 80GB GPUs. ... We measured the end-to-end time of the model generate tokens using the Deep Sparse (Neural Magic, 2021) inference engine on an Intel(R) Xeon(R) Silver 4314 CPU and the nm-vllm (Neural Magic, 2024) inference engine on a NVIDIA RTX 4090 24GB GPU.
Software Dependencies	No	The paper mentions using 'Paged Adam W optimizer', 'Deep Sparse inference engine', 'nm-vllm inference engine', and 'lm-eval-harness' but does not specify version numbers for these software components or any other libraries/frameworks.
Experiment Setup	Yes	During the fine-tuning process, we employed the Paged Adam W optimizer (Dettmers et al., 2024), setting a maximum gradient norm of 0.3. The learning rate followed a linear learning rate schedule and set the learning rate to be 2 10 4. ... We set the fine-tuning steps T = 5 and initial average rank Ω1 = 6.