FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking
Authors: Changlong Shi, Jinmeng Li, He Zhao, Dandan Guo, Yi Chang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments under diverse scenarios demonstrate the superiority of our method over several state-of-the-art approaches, providing a promising tool for enhancing the global model in FL. ... We conduct extensive experiments under diverse scenarios to demonstrate that Fed LWS brings considerable accuracy gains over the state-of-the-art FL approaches. ... Table 1: Top-1 test accuracy (%) on four datasets with three different degrees of heterogeneity. ... Figure 1: Empirical observations on CIFAR-10 with CNN as the backbone; see more results in Appendix B.2. |
| Researcher Affiliation | Academia | Changlong Shi1, Jinmeng Li1, He Zhao2, Dandan Guo1 , Yi Chang1 3 4 School of Artificial Intelligence, Jilin University1 CSIRO s Data612 International Center of Future Science, Jilin University3 Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China4 EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Having introduced our adaptive way of setting γt l , we give an overview of our method, Fed LWS, in Figure 2. The pseudo-code of Fed LWS is shown in Appendix C.1, Algorithm 1, where we highlight the additional steps required by our method compared to Fed Avg. |
| Open Source Code | Yes | 1The source code is available at https://github.com/Changlong Shi/Fed LWS |
| Open Datasets | Yes | Dataset and Baselines. In this paper, we consider four image classification datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Fashion MNIST (Xiao et al., 2017), and Tiny-Image Net (Chrabaszcz et al., 2017); and text classification datasets: AG News (Zhang et al., 2015), Sogou News (Zhang et al., 2015), and Amazon Review (Ben-David et al., 2006). |
| Dataset Splits | Yes | Federated Simulation. To emulate the FL scenario, we randomly partition the training dataset into K groups and assign group k to client k. Namely, each client has its local training dataset. We reserve the testing set on the server-side for evaluating the performance of global model. ... We employ Dirichlet sampling Dirα to synthesize client heterogeneity, it is widely used in FL literature (Wang et al., 2020a; Yurochkin et al., 2019; Ye et al., 2023). The smaller the value of α, the greater the non-IID. We apply the same data synthesis approach to all methods for a fair comparison. ... We set α =0.1, 0.5, and 100, respectively. When α is set to 100, we consider the data to be distributed in an IID manner. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions the software environment. |
| Software Dependencies | Yes | We conduct experiments under Python 3.7.16 and Pytorch 1.13.1 (Paszke et al., 2019). |
| Experiment Setup | Yes | If not mentioned otherwise, The number of clients, participation ratio, and local epoch are set to 20, 1, and 1, respectively. We set β = 0.1 for CNN models and β = 0.01 for Res Net models. We set the initial learning rates as 0.08 and set a decaying LR scheduler in all experiments; that is, in each round, the local learning rate is 0.99*(the learning rate of the last round). We adopt local weight decay in all experiments. We set the weight decay factor as 5e-4. We use SGD optimizer as the clients local optimizer and set momentum as 0.9. |