Targeted Low-rank Refinement: Enhancing Sparse Language Models with Precision
Authors: Li Shen, Anke Tang, Yong Luo, Tao Sun, Han Hu, Xiaochun Cao
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on LLa MA models validate our method s effectiveness across various pruning techniques and sparsity levels. At 50% sparsity, it reduces perplexity by 53.9% compared to conventional magnitude pruning on LLa MA7B. Section 5 is titled "Experiment" and details evaluations on datasets like Wiki Text-2, Truthful QA, GSM8K, ARC-C, and MMLU, presenting perplexity results in tables and figures. |
| Researcher Affiliation | Academia | 1School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China 2National Engineering Research Center for Multimedia Software, School of Computer Science and Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan 430072, Hubei, China 3National University of Defense Technology, Hunan, China 4School of Information and Electronics, Beijing Institute of Technology, Beijing, China 5Key Laboratory of Cyberspace Security, Ministry of Education, China. Correspondence to: Anke Tang <EMAIL>, Yong Luo <EMAIL>. All authors are affiliated with universities or national research institutions, and the provided email addresses are academic domains. |
| Pseudocode | Yes | Algorithm 1 The Proposed Iterative Weight Update Method 1: Inputs: Dense weight matrix W , binary mask P , target rank k, number of iterations T 2: Initialize S(0) W P 3: for t = 0 to T 1 do 4: L(t) W S(t) 5: Compute SVD: L(t) = U (t)Σ(t)V (t) 6: r(t) 1 + k 1 T 1t 7: S(t+1) S(t) + P n U (t) r(t):Σ(t) r(t):V (t) 8: end for 9: L(T ) W S(T ) 10: Returns: S(T ), L(T ) |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We conducted our experiments using LLa MA models, evaluating their performance on the Wiki Text-2 (Merity et al., 2016) and standard benchmarks including Truthful QA (Lin et al., 2021), GSM8K (Cobbe et al., 2021), ARC-C (Clark et al., 2018) and MMLU (Hendrycks et al., 2020). ... When implementing Wanda pruning (Sun et al., 2023) and our method combined with Wanda (Wanda + Ours), we use 128 sequences from the allenai/c4 dataset as calibration data. |
| Dataset Splits | Yes | For evaluation, we use 128 sequences from Wiki Text-2 dataset for perplexity evaluation. ... When implementing Wanda pruning (Sun et al., 2023) and our method combined with Wanda (Wanda + Ours), we use 128 sequences from the allenai/c4 dataset as calibration data. |
| Hardware Specification | No | The paper mentions "NVIDIA Ampere GPUs and newer" in Section E.2 but does not specify exact GPU models (e.g., A100, RTX 3090), CPU models, or memory details used for the experiments, which is not specific enough to determine the hardware used for reproduction. |
| Software Dependencies | No | The paper mentions using 'torch.Tensor' and 'torch.sparse.to sparse semi structured' in Section E.1 and E.2, implying the use of PyTorch, but it does not provide specific version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | It is important to note that our proposed iterative refinement method is entirely data-free and does not require calibration data, as shown in Algorithm 1. We consistently use T = 50 across all experiments, which is sufficient for achieving most of the potential error reduction while maintaining computational efficiency. When implementing Wanda pruning (Sun et al., 2023) and our method combined with Wanda (Wanda + Ours), we use 128 sequences from the allenai/c4 dataset as calibration data. For evaluation, we use 128 sequences from Wiki Text-2 dataset for perplexity evaluation. The target rank k is consistently set to 128 for all low-rank refinement methods and sparsity levels. |