Optimizing Personalized Federated Learning Through Adaptive Layer-Wise Learning
Authors: Weihang Chen, Cheng Yang, Jie Ren, Zhiqiang Li, Zheng Wang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate FLAYER on four representative datasets in computer vision and natural language processing domains. Compared to eight state-of-the-art p FL methods, FLAYER improves the inference accuracy, on average, by 5.20% (up to 14.29%). |
| Researcher Affiliation | Academia | 1Shaanxi Normal University, China 2University of Leeds, United Kingdom EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 details the entire FL process. Algorithm 1 FLAYER Input: N clients, ρ: client joining ratio, L: loss function, Θ0 g: initial global model, η: base local learning rate, s: the hyperparameter of FLAYER. Output: Well-performing local models Θ1, . . . , ΘN 1: Server sends Θ0 g to all clients to initialize local models. 2: for iteration t = 1, . . . , T do 3: Server samples a subset Ct of clients according to ρ. 4: Server sends Θt 1 g to |Ct| clients. 5: for Client k Ct in parallel do 6: Client k initializes local model Θt k by Equation (2). 7: Client k obtains ˆΘt k by Equation (3) (4). 8: Local model training 9: Client k obtains masked Θt k by Equation (5) (9). 10: Client k sends Θt k to the server. Uploading 11: end for 12: Server-side Aggregation: 13: Server obtains Θt g by Θt g P k Ct nk P j Ct nj Θt k by Equation (10). 14: end for 15: return Θ1, . . . , ΘN |
| Open Source Code | Yes | The code is available at https://github.com/lancasterJie/FLAYER/. |
| Open Datasets | Yes | To evaluate the performance of FLAYER, we use a four-layer CNN [Mc Mahan et al., 2017] and Res Net-18 [He et al., 2016] for CV tasks, training them on three benchmark datasets: CIFAR-10, CIFAR-100 [Krizhevsky et al., 2009], and Tiny-Image Net [Chrabaszcz et al., 2017]. For the NLP task, we train fast Text [Joulin et al., 2017] on the AG News dataset [Zhang et al., 2015]. |
| Dataset Splits | Yes | We use the Dirichlet distribution Dir(β) with β = 0.1 [Lin et al., 2020; Wang et al., 2020] to model a high level of heterogeneity across client data... Our experiments consider 20 clients. We examine the impact of statistical heterogeneity on FLAYER and other eight p FL methods, using 20 clients under three settings: β = 0.5, β = 0.1, and β = 0.01. |
| Hardware Specification | Yes | All experiments were conducted on a multi-core server with a 24-core 5.7GHz Intel i9-12900K CPU and an NVIDIA RTX A5000 GPU with 24GB of GPU memory. |
| Software Dependencies | No | The paper describes the models used (CNN, Res Net-18, fast Text) but does not provide specific software dependencies like programming language versions or library versions (e.g., Python 3.x, PyTorch 1.x) with version numbers. |
| Experiment Setup | Yes | Following Fed Avg, we use a batch size of 10 and a single epoch of local model training per iteration... Our experiments consider 20 clients. The number of layers in the head for CNN, Res Net-18, and fast Text is 1, 2 and 1, respectively. Following Fed ALA, we set a base learning rate of 0.1 for Res Net-18 and fast Text and 0.005 for CNN during local training. |