reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training

Authors: Chenyi Yang, Wenjie Nie, Yuxin Zhang, Yuhang Wu, Xiawu Zheng, Guannan Jiang, Rongrong Ji

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse Res Net-50 on Image Net, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. We conduct extensive experiments on validating the effectiveness and efficacy of BAME for N:M sparse training. The results show that BAME is able to get state-of-the-art performance when training N:M sparse networks across a wide range of sparse pattern, datasets, and prevailing DNNs, even with much fewer training FLOPs compared with existing work.
Researcher Affiliation	Collaboration	1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University 2Contemporary Amperex Technology Co., Limited (CATL) 3Institute of Artificial Intelligence, Xiamen University. Correspondence to: Rongrong Ji <EMAIL>.
Pseudocode	Yes	Algorithm 1: BAME for N:M Sparse Training. Require : Weights W; Initial and final iterations for mask adaption ti and tf; Update interval T. Output : Sparse weights W
Open Source Code	Yes	Code is released at https://github.com/BAME-xmu/BAME.
Open Datasets	Yes	We validate the effectiveness of BAME by using it to train N:M sparse networks on image classification tasks on the CIFAR-10 (Krizhevsky et al., 2009) and Image Net-1K datasets (Deng et al., 2009).
Dataset Splits	Yes	For the networks, we sparsify Res Net-32 (He et al., 2016), Mobile Net-V2 on CIFAR-10 dataset, and Res Net-18 (He et al., 2016), Res Net-50 (He et al., 2016), Dei T-small on Image Net-1K dataset. ... We first evaluate the efficacy of BAME for training sparse Res Net-32 and Mobile Net-V2 on the CIFAR10 dataset, which includes 50,000 training images and 10,000 validation images within 10 classes. ... For the large-scale Image Net-1K dataset that contains over 1.2 million images for training and 50,000 images for validation in 1,000 categories
Hardware Specification	Yes	ALL experiments are implemented based on Py Torch and executed on NVIDIA Tesla A100 GPUs.
Software Dependencies	No	ALL experiments are implemented based on Py Torch and executed on NVIDIA Tesla A100 GPUs. (No specific version number for PyTorch or other software dependencies is provided.)
Experiment Setup	Yes	We train N:M sparse networks from scratch via the Stochastic Gradient Descent (SGD) optimizer, paired with a momentum of 0.9 and a batch size of 256. The initial learning rate is set to 0.1 and gradually decayed based on the cosine annealing scheduler. Following previous works, we train all networks for 300 epochs on CIFAR-10, with a weight decay of 0.005. On Image Net, 120 epochs are given for Res Net and 300 epochs for Dei T-small. For the implementation of BAME, we set the LMA update interval T = 100 and 0.5 for both α and β in OBF.