BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training

Authors: Chenyi Yang, Wenjie Nie, Yuxin Zhang, Yuhang Wu, Xiawu Zheng, Guannan Jiang, Rongrong Ji

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse Res Net-50 on Image Net, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. We conduct extensive experiments on validating the effectiveness and efficacy of BAME for N:M sparse training. The results show that BAME is able to get state-of-the-art performance when training N:M sparse networks across a wide range of sparse pattern, datasets, and prevailing DNNs, even with much fewer training FLOPs compared with existing work.
Researcher Affiliation Collaboration 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University 2Contemporary Amperex Technology Co., Limited (CATL) 3Institute of Artificial Intelligence, Xiamen University. Correspondence to: Rongrong Ji <EMAIL>.
Pseudocode Yes Algorithm 1: BAME for N:M Sparse Training. Require : Weights W; Initial and final iterations for mask adaption ti and tf; Update interval T. Output : Sparse weights W
Open Source Code Yes Code is released at https://github.com/BAME-xmu/BAME.
Open Datasets Yes We validate the effectiveness of BAME by using it to train N:M sparse networks on image classification tasks on the CIFAR-10 (Krizhevsky et al., 2009) and Image Net-1K datasets (Deng et al., 2009).
Dataset Splits Yes For the networks, we sparsify Res Net-32 (He et al., 2016), Mobile Net-V2 on CIFAR-10 dataset, and Res Net-18 (He et al., 2016), Res Net-50 (He et al., 2016), Dei T-small on Image Net-1K dataset. ... We first evaluate the efficacy of BAME for training sparse Res Net-32 and Mobile Net-V2 on the CIFAR10 dataset, which includes 50,000 training images and 10,000 validation images within 10 classes. ... For the large-scale Image Net-1K dataset that contains over 1.2 million images for training and 50,000 images for validation in 1,000 categories
Hardware Specification Yes ALL experiments are implemented based on Py Torch and executed on NVIDIA Tesla A100 GPUs.
Software Dependencies No ALL experiments are implemented based on Py Torch and executed on NVIDIA Tesla A100 GPUs. (No specific version number for PyTorch or other software dependencies is provided.)
Experiment Setup Yes We train N:M sparse networks from scratch via the Stochastic Gradient Descent (SGD) optimizer, paired with a momentum of 0.9 and a batch size of 256. The initial learning rate is set to 0.1 and gradually decayed based on the cosine annealing scheduler. Following previous works, we train all networks for 300 epochs on CIFAR-10, with a weight decay of 0.005. On Image Net, 120 epochs are given for Res Net and 300 epochs for Dei T-small. For the implementation of BAME, we set the LMA update interval T = 100 and 0.5 for both α and β in OBF.