Optimizing Personalized Federated Learning Through Adaptive Layer-Wise Learning

Authors: Weihang Chen, Cheng Yang, Jie Ren, Zhiqiang Li, Zheng Wang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate FLAYER on four representative datasets in computer vision and natural language processing domains. Compared to eight state-of-the-art p FL methods, FLAYER improves the inference accuracy, on average, by 5.20% (up to 14.29%).
Researcher Affiliation Academia 1Shaanxi Normal University, China 2University of Leeds, United Kingdom EMAIL EMAIL
Pseudocode Yes Algorithm 1 details the entire FL process. Algorithm 1 FLAYER Input: N clients, ρ: client joining ratio, L: loss function, Θ0 g: initial global model, η: base local learning rate, s: the hyperparameter of FLAYER. Output: Well-performing local models Θ1, . . . , ΘN 1: Server sends Θ0 g to all clients to initialize local models. 2: for iteration t = 1, . . . , T do 3: Server samples a subset Ct of clients according to ρ. 4: Server sends Θt 1 g to |Ct| clients. 5: for Client k Ct in parallel do 6: Client k initializes local model Θt k by Equation (2). 7: Client k obtains ˆΘt k by Equation (3) (4). 8: Local model training 9: Client k obtains masked Θt k by Equation (5) (9). 10: Client k sends Θt k to the server. Uploading 11: end for 12: Server-side Aggregation: 13: Server obtains Θt g by Θt g P k Ct nk P j Ct nj Θt k by Equation (10). 14: end for 15: return Θ1, . . . , ΘN
Open Source Code Yes The code is available at https://github.com/lancasterJie/FLAYER/.
Open Datasets Yes To evaluate the performance of FLAYER, we use a four-layer CNN [Mc Mahan et al., 2017] and Res Net-18 [He et al., 2016] for CV tasks, training them on three benchmark datasets: CIFAR-10, CIFAR-100 [Krizhevsky et al., 2009], and Tiny-Image Net [Chrabaszcz et al., 2017]. For the NLP task, we train fast Text [Joulin et al., 2017] on the AG News dataset [Zhang et al., 2015].
Dataset Splits Yes We use the Dirichlet distribution Dir(β) with β = 0.1 [Lin et al., 2020; Wang et al., 2020] to model a high level of heterogeneity across client data... Our experiments consider 20 clients. We examine the impact of statistical heterogeneity on FLAYER and other eight p FL methods, using 20 clients under three settings: β = 0.5, β = 0.1, and β = 0.01.
Hardware Specification Yes All experiments were conducted on a multi-core server with a 24-core 5.7GHz Intel i9-12900K CPU and an NVIDIA RTX A5000 GPU with 24GB of GPU memory.
Software Dependencies No The paper describes the models used (CNN, Res Net-18, fast Text) but does not provide specific software dependencies like programming language versions or library versions (e.g., Python 3.x, PyTorch 1.x) with version numbers.
Experiment Setup Yes Following Fed Avg, we use a batch size of 10 and a single epoch of local model training per iteration... Our experiments consider 20 clients. The number of layers in the head for CNN, Res Net-18, and fast Text is 1, 2 and 1, respectively. Following Fed ALA, we set a base learning rate of 0.1 for Res Net-18 and fast Text and 0.005 for CNN during local training.