Targeted Low-rank Refinement: Enhancing Sparse Language Models with Precision

Authors: Li Shen, Anke Tang, Yong Luo, Tao Sun, Han Hu, Xiaochun Cao

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on LLa MA models validate our method s effectiveness across various pruning techniques and sparsity levels. At 50% sparsity, it reduces perplexity by 53.9% compared to conventional magnitude pruning on LLa MA7B. Section 5 is titled "Experiment" and details evaluations on datasets like Wiki Text-2, Truthful QA, GSM8K, ARC-C, and MMLU, presenting perplexity results in tables and figures.
Researcher Affiliation Academia 1School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China 2National Engineering Research Center for Multimedia Software, School of Computer Science and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, Hubei, China 3National University of Defense Technology, Hunan, China 4School of Information and Electronics, Beijing Institute of Technology, Beijing, China 5Key Laboratory of Cyberspace Security, Ministry of Education, China. Correspondence to: Anke Tang <EMAIL>, Yong Luo <EMAIL>. All authors are affiliated with universities or national research institutions, and the provided email addresses are academic domains.
Pseudocode Yes Algorithm 1 The Proposed Iterative Weight Update Method 1: Inputs: Dense weight matrix W , binary mask P , target rank k, number of iterations T 2: Initialize S(0) W P 3: for t = 0 to T 1 do 4: L(t) W S(t) 5: Compute SVD: L(t) = U (t)Σ(t)V (t) 6: r(t) 1 + k 1 T 1t 7: S(t+1) S(t) + P n U (t) r(t):Σ(t) r(t):V (t) 8: end for 9: L(T ) W S(T ) 10: Returns: S(T ), L(T )
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We conducted our experiments using LLa MA models, evaluating their performance on the Wiki Text-2 (Merity et al., 2016) and standard benchmarks including Truthful QA (Lin et al., 2021), GSM8K (Cobbe et al., 2021), ARC-C (Clark et al., 2018) and MMLU (Hendrycks et al., 2020). ... When implementing Wanda pruning (Sun et al., 2023) and our method combined with Wanda (Wanda + Ours), we use 128 sequences from the allenai/c4 dataset as calibration data.
Dataset Splits Yes For evaluation, we use 128 sequences from Wiki Text-2 dataset for perplexity evaluation. ... When implementing Wanda pruning (Sun et al., 2023) and our method combined with Wanda (Wanda + Ours), we use 128 sequences from the allenai/c4 dataset as calibration data.
Hardware Specification No The paper mentions "NVIDIA Ampere GPUs and newer" in Section E.2 but does not specify exact GPU models (e.g., A100, RTX 3090), CPU models, or memory details used for the experiments, which is not specific enough to determine the hardware used for reproduction.
Software Dependencies No The paper mentions using 'torch.Tensor' and 'torch.sparse.to sparse semi structured' in Section E.1 and E.2, implying the use of PyTorch, but it does not provide specific version numbers for any software libraries or dependencies.
Experiment Setup Yes It is important to note that our proposed iterative refinement method is entirely data-free and does not require calibration data, as shown in Algorithm 1. We consistently use T = 50 across all experiments, which is sufficient for achieving most of the potential error reduction while maintaining computational efficiency. When implementing Wanda pruning (Sun et al., 2023) and our method combined with Wanda (Wanda + Ours), we use 128 sequences from the allenai/c4 dataset as calibration data. For evaluation, we use 128 sequences from Wiki Text-2 dataset for perplexity evaluation. The target rank k is consistently set to 128 for all low-rank refinement methods and sparsity levels.