An Efficient and Accurate Dynamic Sparse Training Framework Based on Parameter-Freezing

Authors: Lei Li, Haochen Yang, Jiacheng Guo, Hongkai Yu, Minghai Qin, Tianyun Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the model accuracy has significantly improved when combining our proposed methods. For example, compared with the previous state-of-the-art methods with the same total amount of communication cost and computation FLOPs, the accuracy increases on average by 4% and 6% in our methods for CIFAR-10 and CIFAR-100 datasets on Res Net-18, respectively. On the other hand, when targeting the same accuracy, the proposed method can reduce the communication cost by 4-8 times for different datasets with different sparsity levels.
Researcher Affiliation Collaboration Lei Li1*, Haochen Yang1*, Jiacheng Guo1, Hongkai Yu1, Minghai Qin1,2 , Tianyun Zhang1 1Cleveland State University, Cleveland, USA 2Western Digital Research, Milpitas, USA
Pseudocode Yes Algorithm 1: Mask readjustment on the server with differential sparsity Algorithm 2: Parameter-freezing-based dynamic sparse training
Open Source Code Yes 1Code and Appendix: https://github.com/Dawns14/pffdst.git
Open Datasets Yes This paper evaluates the performance of a proposed framework, PFFDST1, against established FL techniques on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and CIFAR-100 (Krizhevsky, Hinton et al. 2009) datasets using Le Net (Le Cun et al. 1998) and Res Net-18 (He et al. 2016) models.
Dataset Splits Yes The datasets are partitioned across multiple clients using a Dirichlet distribution (α = 0.1) to simulate non-IID data settings. Le Net experiments are evaluated based on the highest accuracy, aligning with established Fed DST (Bibikar et al. 2022) benchmarks. In the experiments using Res Net-18, we report the average accuracy with error bounds to provide a comprehensive performance assessment. These error bounds, derived from multiple experimental runs, offer a robust measure of the variability in performance and enhance the statistical significance of the reported average accuracy.
Hardware Specification Yes The implementation leverages Py Torch (Paszke et al. 2019) on a server equipped with 8 A6000 GPUs, with detailed hyperparameter settings outlined in Appendix A.
Software Dependencies No The implementation leverages Py Torch (Paszke et al. 2019) on a server equipped with 8 A6000 GPUs, with detailed hyperparameter settings outlined in Appendix A. Although PyTorch is mentioned, a specific version number is not provided in the main text.
Experiment Setup Yes The number of training epochs is set to 3 for Fed DST and 2 for PFFDST to maintain comparable FLOPs. Additional parameters include R = 10, Rend = total rounds/8, 400 clients for Le Net, 200 clients for Res Net-18, and 20 randomly selected clients per communication round. The implementation leverages Py Torch (Paszke et al. 2019) on a server equipped with 8 A6000 GPUs, with detailed hyperparameter settings outlined in Appendix A. PFFDST configurations utilize two sparsity levels: s1 = (s + 1)/2 and s2 = s, with differential sparsities f1 = f2 = (1 s)/4 to control communication overhead.