BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training
Authors: Chenyi Yang, Wenjie Nie, Yuxin Zhang, Yuhang Wu, Xiawu Zheng, Guannan Jiang, Rongrong Ji
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse Res Net-50 on Image Net, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. We conduct extensive experiments on validating the effectiveness and efficacy of BAME for N:M sparse training. The results show that BAME is able to get state-of-the-art performance when training N:M sparse networks across a wide range of sparse pattern, datasets, and prevailing DNNs, even with much fewer training FLOPs compared with existing work. |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University 2Contemporary Amperex Technology Co., Limited (CATL) 3Institute of Artificial Intelligence, Xiamen University. Correspondence to: Rongrong Ji <EMAIL>. |
| Pseudocode | Yes | Algorithm 1: BAME for N:M Sparse Training. Require : Weights W; Initial and final iterations for mask adaption ti and tf; Update interval T. Output : Sparse weights W |
| Open Source Code | Yes | Code is released at https://github.com/BAME-xmu/BAME. |
| Open Datasets | Yes | We validate the effectiveness of BAME by using it to train N:M sparse networks on image classification tasks on the CIFAR-10 (Krizhevsky et al., 2009) and Image Net-1K datasets (Deng et al., 2009). |
| Dataset Splits | Yes | For the networks, we sparsify Res Net-32 (He et al., 2016), Mobile Net-V2 on CIFAR-10 dataset, and Res Net-18 (He et al., 2016), Res Net-50 (He et al., 2016), Dei T-small on Image Net-1K dataset. ... We first evaluate the efficacy of BAME for training sparse Res Net-32 and Mobile Net-V2 on the CIFAR10 dataset, which includes 50,000 training images and 10,000 validation images within 10 classes. ... For the large-scale Image Net-1K dataset that contains over 1.2 million images for training and 50,000 images for validation in 1,000 categories |
| Hardware Specification | Yes | ALL experiments are implemented based on Py Torch and executed on NVIDIA Tesla A100 GPUs. |
| Software Dependencies | No | ALL experiments are implemented based on Py Torch and executed on NVIDIA Tesla A100 GPUs. (No specific version number for PyTorch or other software dependencies is provided.) |
| Experiment Setup | Yes | We train N:M sparse networks from scratch via the Stochastic Gradient Descent (SGD) optimizer, paired with a momentum of 0.9 and a batch size of 256. The initial learning rate is set to 0.1 and gradually decayed based on the cosine annealing scheduler. Following previous works, we train all networks for 300 epochs on CIFAR-10, with a weight decay of 0.005. On Image Net, 120 epochs are given for Res Net and 300 epochs for Dei T-small. For the implementation of BAME, we set the LMA update interval T = 100 and 0.5 for both α and β in OBF. |