reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Efficient Pruner for Large Language Model with Theoretical Guarantee

Authors: Canhong Wen, Yihong Zuo, Wenliang Pan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate that m AIHT outperforms state-of-the-art pruning techniques by effectively pruning the LLa MA-7B model across various evaluation metrics. In our experiments, we benchmark m AIHT against the latest state-of-the-art methods through the pruning of the LLa MA-7B model (Touvron et al., 2023). The findings indicate that m AIHT outperforms its counterparts, delivering superior pruning performance across low to moderate sparsity levels.
Researcher Affiliation	Academia	1Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, China 2School of Gifted Young, University of Science and Technology of China, Hefei, China 3State Key Laboratory of Mathematical Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China. Correspondence to: Wenliang Pan <EMAIL>.
Pseudocode	Yes	Algorithm 1 Adaptive IHT/m AIHT algorithm for layerwise pruning
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It only mentions using the codebase released by other researchers for comparison: "We use the original settings of these algorithms in the codebase released by Sun et al. (2023) and Boˇza (2024)."
Open Datasets	Yes	Like Wanda (Sun et al., 2023), we randomly utilize 128 calibration samples drawn from the C4 training dataset (Raffel et al., 2020). First, we train the LLa MA-7B model on the Wiki Text2 dataset (Merity et al., 2016) and perform weight pruning using different methods. The tasks include Bool Q (Clark et al., 2019), RTE (Wang, 2018), Hella SWAG (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARC easy and challenge (Clark et al., 2018), and Openbook QA (Mihaylov et al., 2018).
Dataset Splits	Yes	Like Wanda (Sun et al., 2023), we randomly utilize 128 calibration samples drawn from the C4 training dataset (Raffel et al., 2020). First, we train the LLa MA-7B model on the Wiki Text2 dataset (Merity et al., 2016) and perform weight pruning using different methods. The tasks include Bool Q (Clark et al., 2019), RTE (Wang, 2018), Hella SWAG (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARC easy and challenge (Clark et al., 2018), and Openbook QA (Mihaylov et al., 2018).
Hardware Specification	Yes	All experiments were performed with an A100 GPU.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers. It only mentions using codebases from other papers for comparison, e.g., "We use the original settings of these algorithms in the codebase released by Sun et al. (2023) and Boˇza (2024)." but no versioned software used by their own method.
Experiment Setup	Yes	We set the step size α = α1 = α2 = 0.95/ XTX 2, the ℓ2 penalty coefficient µ = 0.1 (as defined in (5)), the number of m AIHT iterations t1 = 50, and the number of weight refining iterations t2 = 30.