Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Authors: Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When applied to the LLa MA2-7B model, AST reduces the perplexity and zero-shot accuracy gap between dense and 2:4 semi-structured sparse models to 0.6 and 1.16%, respectively, utilizing less than 0.4% of the pretraining tokens and GPU hours. Our work demonstrates the feasibility of deploying semi-structured sparse LLMs and offers a promising alternative for achieving highly compressed models when combined with existing quantization techniques.
Researcher Affiliation Academia Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University EMAIL;EMAIL
Pseudocode Yes Algorithm 1: Training Process for AST
Open Source Code Yes Code https://github.com/thu-ml/Adaptive-Sparse-Trainer
Open Datasets Yes For training smaller models like the OPT and GPT2 model families, we utilized the C4 dataset (Raffel et al. 2020). For the LLa MA2-7B model, we employed a more comprehensive dataset, Red Pajama-v11, which encompasses data from seven domains: Common Crawl, C4, Git Hub, Wikipedia, Books, Ar Xiv, and Stack Exchange.
Dataset Splits No The paper mentions using the C4 dataset and Red Pajama-v11 for training, and evaluating on Wiki Text-2 and Eleuther AI's LM Harness for zero-shot tasks. While these are common datasets/benchmarks, the paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits used for their specific training runs on C4/Red Pajama) within the main text.
Hardware Specification Yes Table 6: Speedup results using Tensor RT-LLM on RTX4090 and L20 GPUs with different input and output sequence lengths, measured by throughput (tokens/s).
Software Dependencies No The paper mentions "Tensor RT-LLM2" but does not provide a specific version number for this or any other software dependency.
Experiment Setup No Optimal hyperparameters were identified through a grid search, with the specific hyperparameters and training details provided in Section 3 of the Appendix.