SlimLLM: Accurate Structured Pruning for Large Language Models
Authors: Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on the LLa MA benchmark results, our Slim LLM outperforms other methods and achieves state-of-the-art performance. (Page 1) and 5. Experiments (Page 5) |
| Researcher Affiliation | Industry | 1Huawei Noah s Ark Lab, China. Correspondence to: Xinghao Chen <EMAIL>, Yunhe Wang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Greedy Search for head pruning |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate the performance of model by performing zero-shot task classification on common sense reasoning datasets, which follow the setting of LLM-pruner (Ma et al., 2023), including Bool Q (Clark et al., 2019), PIQA (Bisk et al., 2020), Hella Swag (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARCeasy (Clark et al., 2018), ARC-challenge (Clark et al., 2018) and Openbook QA (Mihaylov et al., 2018). Meanwhile, a zero-shot perplexity (PPL) evaluation is also conducted on the Wik Text2 (Merity et al., 2016) and PTB datasets (Marcus et al., 1993). |
| Dataset Splits | Yes | For the calculation of importance score, we randomly selected 32 samples from Bookcorpus, and the sequence length of each samples is 128. |
| Hardware Specification | Yes | All latency measurements were conducted on a single NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions using Lo RA for fine-tuning but does not provide specific version numbers for any software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | When finetuning, we use a single GPU with 2 epochs on cleaned version of Alpaca (Taori et al., 2023), retaining the same settings as LLM-pruner. We finetune the pruned model with Lo RA. The learning rate is set to 1e-4, and the batch-size is 64. For the pruning ratio assigned to each layer, when the pruning ratio is set at 20%, the parameter α in Equation 11 is configured to be 10. When the pruning ratio is increased to 50%, we correspondingly decrease the parameter value, setting it to 7. |