Numerical Pruning for Efficient Autoregressive Models
Authors: Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs. |
| Researcher Affiliation | Collaboration | 1Northeastern University 2Adobe Research 3University of Pennsylvania 4Middle Tennessee State University 5Monash University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Numerical Score with Newton s Method |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We compare the perplexity of the models on the Wiki Text2 (Merity et al. 2016), PTB (Marcus, Santorini, and Marcinkiewicz 1993), and C4 (Raffel et al. 2020) datasets with the 2048 sequence length. We also follow LLM-Pruner to evaluate the zero-shot accuracy on common sense reasoning zero-shot classification datasets including Bool Q (Clark et al. 2019), PIQA (Bisk et al. 2020), Hella Swag (Zellers et al. 2019), Wino Grande (Sakaguchi et al. 2021), ARC-easy (Clark et al. 2018), ARC-challenge (Clark et al. 2018), and Openbook QA (Mihaylov et al. 2018). ... As for the image generation tasks, we adopt the Llama Gen (Sun et al. 2024) model family with Llama Gen-XXL and Llama Gen-3B to verify the effectiveness of our method on image generation tasks. ... on Image Net dataset (Deng et al. 2009). |
| Dataset Splits | No | The paper mentions using 128 samples from the training dataset of Wiki Text2 for compensation and generating 128 images for each class of ImageNet for numerical score and compensation, but it does not provide explicit train/test/validation splits for the main evaluation of the models. |
| Hardware Specification | Yes | The results are obtained using an NVIDIA A100 GPU with a sentence consisting of 64 tokens as the model input. |
| Software Dependencies | No | The paper mentions using "ADM's Tensor Flow scripts (Dhariwal and Nichol 2021)" but does not specify any version numbers for TensorFlow or other software dependencies. |
| Experiment Setup | Yes | We compare the perplexity of the models on the Wiki Text2 ... with the 2048 sequence length. ... We adopt 128 samples from training dataset of Wiki Text2 to compute the numerical score and compensate the pruned models. ... The results are obtained using an NVIDIA A100 GPU with a sentence consisting of 64 tokens as the model input. ... Llama Gen-XXL (cfg=1.75) and Llama Gen-3B (cfg=1.65) models on Image Net with 384 384 resolution. |