An Efficient Pruner for Large Language Model with Theoretical Guarantee
Authors: Canhong Wen, Yihong Zuo, Wenliang Pan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate that m AIHT outperforms state-of-the-art pruning techniques by effectively pruning the LLa MA-7B model across various evaluation metrics. In our experiments, we benchmark m AIHT against the latest state-of-the-art methods through the pruning of the LLa MA-7B model (Touvron et al., 2023). The findings indicate that m AIHT outperforms its counterparts, delivering superior pruning performance across low to moderate sparsity levels. |
| Researcher Affiliation | Academia | 1Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, China 2School of Gifted Young, University of Science and Technology of China, Hefei, China 3State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China. Correspondence to: Wenliang Pan <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Adaptive IHT/m AIHT algorithm for layerwise pruning |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It only mentions using the codebase released by other researchers for comparison: "We use the original settings of these algorithms in the codebase released by Sun et al. (2023) and Boˇza (2024)." |
| Open Datasets | Yes | Like Wanda (Sun et al., 2023), we randomly utilize 128 calibration samples drawn from the C4 training dataset (Raffel et al., 2020). First, we train the LLa MA-7B model on the Wiki Text2 dataset (Merity et al., 2016) and perform weight pruning using different methods. The tasks include Bool Q (Clark et al., 2019), RTE (Wang, 2018), Hella SWAG (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARC easy and challenge (Clark et al., 2018), and Openbook QA (Mihaylov et al., 2018). |
| Dataset Splits | Yes | Like Wanda (Sun et al., 2023), we randomly utilize 128 calibration samples drawn from the C4 training dataset (Raffel et al., 2020). First, we train the LLa MA-7B model on the Wiki Text2 dataset (Merity et al., 2016) and perform weight pruning using different methods. The tasks include Bool Q (Clark et al., 2019), RTE (Wang, 2018), Hella SWAG (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARC easy and challenge (Clark et al., 2018), and Openbook QA (Mihaylov et al., 2018). |
| Hardware Specification | Yes | All experiments were performed with an A100 GPU. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. It only mentions using codebases from other papers for comparison, e.g., "We use the original settings of these algorithms in the codebase released by Sun et al. (2023) and Boˇza (2024)." but no versioned software used by their own method. |
| Experiment Setup | Yes | We set the step size α = α1 = α2 = 0.95/ XTX 2, the ℓ2 penalty coefficient µ = 0.1 (as defined in (5)), the number of m AIHT iterations t1 = 50, and the number of weight refining iterations t2 = 30. |