Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Authors: Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When applied to the LLa MA2-7B model, AST reduces the perplexity and zero-shot accuracy gap between dense and 2:4 semi-structured sparse models to 0.6 and 1.16%, respectively, utilizing less than 0.4% of the pretraining tokens and GPU hours. Our work demonstrates the feasibility of deploying semi-structured sparse LLMs and offers a promising alternative for achieving highly compressed models when combined with existing quantization techniques. |
| Researcher Affiliation | Academia | Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University EMAIL;EMAIL |
| Pseudocode | Yes | Algorithm 1: Training Process for AST |
| Open Source Code | Yes | Code https://github.com/thu-ml/Adaptive-Sparse-Trainer |
| Open Datasets | Yes | For training smaller models like the OPT and GPT2 model families, we utilized the C4 dataset (Raffel et al. 2020). For the LLa MA2-7B model, we employed a more comprehensive dataset, Red Pajama-v11, which encompasses data from seven domains: Common Crawl, C4, Git Hub, Wikipedia, Books, Ar Xiv, and Stack Exchange. |
| Dataset Splits | No | The paper mentions using the C4 dataset and Red Pajama-v11 for training, and evaluating on Wiki Text-2 and Eleuther AI's LM Harness for zero-shot tasks. While these are common datasets/benchmarks, the paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits used for their specific training runs on C4/Red Pajama) within the main text. |
| Hardware Specification | Yes | Table 6: Speedup results using Tensor RT-LLM on RTX4090 and L20 GPUs with different input and output sequence lengths, measured by throughput (tokens/s). |
| Software Dependencies | No | The paper mentions "Tensor RT-LLM2" but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | No | Optimal hyperparameters were identified through a grid search, with the specific hyperparameters and training details provided in Section 3 of the Appendix. |