reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Numerical Pruning for Efficient Autoregressive Models

Authors: Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify the effectiveness of our method, we provide both theoretical support and extensive experiments. Our experiments show that our method achieves state-of-the-art performance with reduced memory usage and faster generation speeds on GPUs.
Researcher Affiliation	Collaboration	1Northeastern University 2Adobe Research 3University of Pennsylvania 4Middle Tennessee State University 5Monash University EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Numerical Score with Newton s Method
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	We compare the perplexity of the models on the Wiki Text2 (Merity et al. 2016), PTB (Marcus, Santorini, and Marcinkiewicz 1993), and C4 (Raffel et al. 2020) datasets with the 2048 sequence length. We also follow LLM-Pruner to evaluate the zero-shot accuracy on common sense reasoning zero-shot classification datasets including Bool Q (Clark et al. 2019), PIQA (Bisk et al. 2020), Hella Swag (Zellers et al. 2019), Wino Grande (Sakaguchi et al. 2021), ARC-easy (Clark et al. 2018), ARC-challenge (Clark et al. 2018), and Openbook QA (Mihaylov et al. 2018). ... As for the image generation tasks, we adopt the Llama Gen (Sun et al. 2024) model family with Llama Gen-XXL and Llama Gen-3B to verify the effectiveness of our method on image generation tasks. ... on Image Net dataset (Deng et al. 2009).
Dataset Splits	No	The paper mentions using 128 samples from the training dataset of Wiki Text2 for compensation and generating 128 images for each class of ImageNet for numerical score and compensation, but it does not provide explicit train/test/validation splits for the main evaluation of the models.
Hardware Specification	Yes	The results are obtained using an NVIDIA A100 GPU with a sentence consisting of 64 tokens as the model input.
Software Dependencies	No	The paper mentions using "ADM's Tensor Flow scripts (Dhariwal and Nichol 2021)" but does not specify any version numbers for TensorFlow or other software dependencies.
Experiment Setup	Yes	We compare the perplexity of the models on the Wiki Text2 ... with the 2048 sequence length. ... We adopt 128 samples from training dataset of Wiki Text2 to compute the numerical score and compensate the pruned models. ... The results are obtained using an NVIDIA A100 GPU with a sentence consisting of 64 tokens as the model input. ... Llama Gen-XXL (cfg=1.75) and Llama Gen-3B (cfg=1.65) models on Image Net with 384 384 resolution.