reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Authors: Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When applied to the LLa MA2-7B model, AST reduces the perplexity and zero-shot accuracy gap between dense and 2:4 semi-structured sparse models to 0.6 and 1.16%, respectively, utilizing less than 0.4% of the pretraining tokens and GPU hours. Our work demonstrates the feasibility of deploying semi-structured sparse LLMs and offers a promising alternative for achieving highly compressed models when combined with existing quantization techniques.
Researcher Affiliation	Academia	Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, THBI Lab, Tsinghua-Bosch Joint ML Center, Tsinghua University EMAIL;EMAIL
Pseudocode	Yes	Algorithm 1: Training Process for AST
Open Source Code	Yes	Code https://github.com/thu-ml/Adaptive-Sparse-Trainer
Open Datasets	Yes	For training smaller models like the OPT and GPT2 model families, we utilized the C4 dataset (Raffel et al. 2020). For the LLa MA2-7B model, we employed a more comprehensive dataset, Red Pajama-v11, which encompasses data from seven domains: Common Crawl, C4, Git Hub, Wikipedia, Books, Ar Xiv, and Stack Exchange.
Dataset Splits	No	The paper mentions using the C4 dataset and Red Pajama-v11 for training, and evaluating on Wiki Text-2 and Eleuther AI's LM Harness for zero-shot tasks. While these are common datasets/benchmarks, the paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages, sample counts, or explicit references to predefined splits used for their specific training runs on C4/Red Pajama) within the main text.
Hardware Specification	Yes	Table 6: Speedup results using Tensor RT-LLM on RTX4090 and L20 GPUs with different input and output sequence lengths, measured by throughput (tokens/s).
Software Dependencies	No	The paper mentions "Tensor RT-LLM2" but does not provide a specific version number for this or any other software dependency.
Experiment Setup	No	Optimal hyperparameters were identified through a grid search, with the specific hyperparameters and training details provided in Section 3 of the Appendix.