SlimLLM: Accurate Structured Pruning for Large Language Models

Authors: Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on the LLa MA benchmark results, our Slim LLM outperforms other methods and achieves state-of-the-art performance. (Page 1) and 5. Experiments (Page 5)
Researcher Affiliation Industry 1Huawei Noah s Ark Lab, China. Correspondence to: Xinghao Chen <EMAIL>, Yunhe Wang <EMAIL>.
Pseudocode Yes Algorithm 1 Greedy Search for head pruning
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We evaluate the performance of model by performing zero-shot task classification on common sense reasoning datasets, which follow the setting of LLM-pruner (Ma et al., 2023), including Bool Q (Clark et al., 2019), PIQA (Bisk et al., 2020), Hella Swag (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARCeasy (Clark et al., 2018), ARC-challenge (Clark et al., 2018) and Openbook QA (Mihaylov et al., 2018). Meanwhile, a zero-shot perplexity (PPL) evaluation is also conducted on the Wik Text2 (Merity et al., 2016) and PTB datasets (Marcus et al., 1993).
Dataset Splits Yes For the calculation of importance score, we randomly selected 32 samples from Bookcorpus, and the sequence length of each samples is 128.
Hardware Specification Yes All latency measurements were conducted on a single NVIDIA V100 GPU.
Software Dependencies No The paper mentions using Lo RA for fine-tuning but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup Yes When finetuning, we use a single GPU with 2 epochs on cleaned version of Alpaca (Taori et al., 2023), retaining the same settings as LLM-pruner. We finetune the pruned model with Lo RA. The learning rate is set to 1e-4, and the batch-size is 64. For the pruning ratio assigned to each layer, when the pruning ratio is set at 20%, the parameter α in Equation 11 is configured to be 10. When the pruning ratio is increased to 50%, we correspondingly decrease the parameter value, setting it to 7.