An Efficient and Accurate Dynamic Sparse Training Framework Based on Parameter-Freezing
Authors: Lei Li, Haochen Yang, Jiacheng Guo, Hongkai Yu, Minghai Qin, Tianyun Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the model accuracy has significantly improved when combining our proposed methods. For example, compared with the previous state-of-the-art methods with the same total amount of communication cost and computation FLOPs, the accuracy increases on average by 4% and 6% in our methods for CIFAR-10 and CIFAR-100 datasets on Res Net-18, respectively. On the other hand, when targeting the same accuracy, the proposed method can reduce the communication cost by 4-8 times for different datasets with different sparsity levels. |
| Researcher Affiliation | Collaboration | Lei Li1*, Haochen Yang1*, Jiacheng Guo1, Hongkai Yu1, Minghai Qin1,2 , Tianyun Zhang1 1Cleveland State University, Cleveland, USA 2Western Digital Research, Milpitas, USA |
| Pseudocode | Yes | Algorithm 1: Mask readjustment on the server with differential sparsity Algorithm 2: Parameter-freezing-based dynamic sparse training |
| Open Source Code | Yes | 1Code and Appendix: https://github.com/Dawns14/pffdst.git |
| Open Datasets | Yes | This paper evaluates the performance of a proposed framework, PFFDST1, against established FL techniques on CIFAR-10 (Krizhevsky, Hinton et al. 2009) and CIFAR-100 (Krizhevsky, Hinton et al. 2009) datasets using Le Net (Le Cun et al. 1998) and Res Net-18 (He et al. 2016) models. |
| Dataset Splits | Yes | The datasets are partitioned across multiple clients using a Dirichlet distribution (α = 0.1) to simulate non-IID data settings. Le Net experiments are evaluated based on the highest accuracy, aligning with established Fed DST (Bibikar et al. 2022) benchmarks. In the experiments using Res Net-18, we report the average accuracy with error bounds to provide a comprehensive performance assessment. These error bounds, derived from multiple experimental runs, offer a robust measure of the variability in performance and enhance the statistical significance of the reported average accuracy. |
| Hardware Specification | Yes | The implementation leverages Py Torch (Paszke et al. 2019) on a server equipped with 8 A6000 GPUs, with detailed hyperparameter settings outlined in Appendix A. |
| Software Dependencies | No | The implementation leverages Py Torch (Paszke et al. 2019) on a server equipped with 8 A6000 GPUs, with detailed hyperparameter settings outlined in Appendix A. Although PyTorch is mentioned, a specific version number is not provided in the main text. |
| Experiment Setup | Yes | The number of training epochs is set to 3 for Fed DST and 2 for PFFDST to maintain comparable FLOPs. Additional parameters include R = 10, Rend = total rounds/8, 400 clients for Le Net, 200 clients for Res Net-18, and 20 randomly selected clients per communication round. The implementation leverages Py Torch (Paszke et al. 2019) on a server equipped with 8 A6000 GPUs, with detailed hyperparameter settings outlined in Appendix A. PFFDST configurations utilize two sparsity levels: s1 = (s + 1)/2 and s2 = s, with differential sparsities f1 = f2 = (1 s)/4 to control communication overhead. |