reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PROXSPARSE: REGULARIZED LEARNING OF SEMI-STRUCTURED SPARSITY MASKS FOR PRETRAINED LLMS

Authors: Hongyi Liu, Rajarshi Saha, Zhen Jia, Youngsuk Park, Jiaji Huang, Shoham Sabach, Yu-Xiang Wang, George Karypis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive evaluations on 7 widely used models show that Prox Sparse consistently outperforms previously proposed semi-structured mask selection methods with significant improvement, demonstrating the effectiveness of our learned approach towards semi-structured pruning. We conducted extensive experiments on 7 widely used high-performance open-source models from four model families including Mistral (Jiang et al., 2023), Qwen (Yang et al., 2024), Open Llama (Geng & Liu, 2023) and Llama (Touvron et al., 2023) family.
Researcher Affiliation	Collaboration	1Rice University 2Amazon Web Service 3Technion 4UCSD. Correspondence to: Hongyi L. <EMAIL>, Rajarshi S. <EMAIL>, Yu-Xiang W. <EMAIL>.
Pseudocode	Yes	Algorithm 1 Prox Sparse: Proximal Gradient Descent for End-to-End 2:4-Sparsity Pruning Algorithm 2 ALM: Alternating Minimization Algorithm 3 Enum ALM for solving (6)
Open Source Code	Yes	Code available here.
Open Datasets	Yes	For calibration, we followed Wanda (Sun et al., 2023) and Sparse GPT (Frantar & Alistarh, 2023) to utilize the C4 (Raffel et al., 2020) dataset for calibration. Zero-shot performance was evaluated with the Eleuther AI LM-Eval Harness (Gao et al., 2024) on seven widely used tasks (Liu et al., 2024), while Wikitext (Merity et al., 2016) perplexity (PPL) was used as the language modeling metric, consistent with previous evaluation protocol (Sun et al., 2023; Frantar & Alistarh, 2023).
Dataset Splits	No	The experiments use 400 data samples for calibration unless specified, with consistent counts across baselines for fair comparison.
Hardware Specification	Yes	Our experiments were done on Nvidia A100 GPUs. We utilize the Nvidia CUTLASS library as the underlying implementation for 2:4 semi-structured sparse operations.
Software Dependencies	No	Table 8 presents the configurations and hyperparameters used in our experiments. There are three key hyperparameters for learning an optimal semi-structured mask: sparsity regularization strength (λ1), frozen weight regularization extent ( λ2), and learning rate. Our learning procedure follows standard settings, using Adam W as the optimizer with a warmup ratio of 0.1. (No specific versions for AdamW or CUTLASS are provided).
Experiment Setup	Yes	Table 8 presents the configurations and hyperparameters used in our experiments. There are three key hyperparameters for learning an optimal semi-structured mask: sparsity regularization strength (λ1), frozen weight regularization extent ( λ2), and learning rate. Our learning procedure follows standard settings, using Adam W as the optimizer with a warmup ratio of 0.1.