FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking

Authors: Changlong Shi, Jinmeng Li, He Zhao, Dandan Guo, Yi Chang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments under diverse scenarios demonstrate the superiority of our method over several state-of-the-art approaches, providing a promising tool for enhancing the global model in FL. ... We conduct extensive experiments under diverse scenarios to demonstrate that Fed LWS brings considerable accuracy gains over the state-of-the-art FL approaches. ... Table 1: Top-1 test accuracy (%) on four datasets with three different degrees of heterogeneity. ... Figure 1: Empirical observations on CIFAR-10 with CNN as the backbone; see more results in Appendix B.2.
Researcher Affiliation Academia Changlong Shi1, Jinmeng Li1, He Zhao2, Dandan Guo1 , Yi Chang1 3 4 School of Artificial Intelligence, Jilin University1 CSIRO s Data612 International Center of Future Science, Jilin University3 Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China4 EMAIL, EMAIL, EMAIL
Pseudocode Yes Having introduced our adaptive way of setting γt l , we give an overview of our method, Fed LWS, in Figure 2. The pseudo-code of Fed LWS is shown in Appendix C.1, Algorithm 1, where we highlight the additional steps required by our method compared to Fed Avg.
Open Source Code Yes 1The source code is available at https://github.com/Changlong Shi/Fed LWS
Open Datasets Yes Dataset and Baselines. In this paper, we consider four image classification datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Fashion MNIST (Xiao et al., 2017), and Tiny-Image Net (Chrabaszcz et al., 2017); and text classification datasets: AG News (Zhang et al., 2015), Sogou News (Zhang et al., 2015), and Amazon Review (Ben-David et al., 2006).
Dataset Splits Yes Federated Simulation. To emulate the FL scenario, we randomly partition the training dataset into K groups and assign group k to client k. Namely, each client has its local training dataset. We reserve the testing set on the server-side for evaluating the performance of global model. ... We employ Dirichlet sampling Dirα to synthesize client heterogeneity, it is widely used in FL literature (Wang et al., 2020a; Yurochkin et al., 2019; Ye et al., 2023). The smaller the value of α, the greater the non-IID. We apply the same data synthesis approach to all methods for a fair comparison. ... We set α =0.1, 0.5, and 100, respectively. When α is set to 100, we consider the data to be distributed in an IID manner.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions the software environment.
Software Dependencies Yes We conduct experiments under Python 3.7.16 and Pytorch 1.13.1 (Paszke et al., 2019).
Experiment Setup Yes If not mentioned otherwise, The number of clients, participation ratio, and local epoch are set to 20, 1, and 1, respectively. We set β = 0.1 for CNN models and β = 0.01 for Res Net models. We set the initial learning rates as 0.08 and set a decaying LR scheduler in all experiments; that is, in each round, the local learning rate is 0.99*(the learning rate of the last round). We adopt local weight decay in all experiments. We set the weight decay factor as 5e-4. We use SGD optimizer as the clients local optimizer and set momentum as 0.9.