reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SlimLLM: Accurate Structured Pruning for Large Language Models

Authors: Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on the LLa MA benchmark results, our Slim LLM outperforms other methods and achieves state-of-the-art performance. (Page 1) and 5. Experiments (Page 5)
Researcher Affiliation	Industry	1Huawei Noah s Ark Lab, China. Correspondence to: Xinghao Chen <EMAIL>, Yunhe Wang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Greedy Search for head pruning
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We evaluate the performance of model by performing zero-shot task classiﬁcation on common sense reasoning datasets, which follow the setting of LLM-pruner (Ma et al., 2023), including Bool Q (Clark et al., 2019), PIQA (Bisk et al., 2020), Hella Swag (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARCeasy (Clark et al., 2018), ARC-challenge (Clark et al., 2018) and Openbook QA (Mihaylov et al., 2018). Meanwhile, a zero-shot perplexity (PPL) evaluation is also conducted on the Wik Text2 (Merity et al., 2016) and PTB datasets (Marcus et al., 1993).
Dataset Splits	Yes	For the calculation of importance score, we randomly selected 32 samples from Bookcorpus, and the sequence length of each samples is 128.
Hardware Specification	Yes	All latency measurements were conducted on a single NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions using Lo RA for fine-tuning but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup	Yes	When ﬁnetuning, we use a single GPU with 2 epochs on cleaned version of Alpaca (Taori et al., 2023), retaining the same settings as LLM-pruner. We ﬁnetune the pruned model with Lo RA. The learning rate is set to 1e-4, and the batch-size is 64. For the pruning ratio assigned to each layer, when the pruning ratio is set at 20%, the parameter α in Equation 11 is conﬁgured to be 10. When the pruning ratio is increased to 50%, we correspondingly decrease the parameter value, setting it to 7.