Personalized Federated Learning: A Unified Framework and Universal Optimization Techniques
Authors: Filip Hanzely, Boxin Zhao, mladen kolar
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present an extensive numerical evaluation to verify and support the theoretical claims. We perform experiments on both synthetic and real data, with a range of different objectives and methods (both ours and the baselines from the literature). The experiments are designed to shed light on various aspects of the theory. In this section, we present the results on synthetic data, while in the next section, we illustrate the performance of different methods on real data. The code to reproduce the experiments is publicly available at https://github.com/boxinz17/PFL-Unified-Framework. The experiments on synthetic data were conducted on a personal laptop with a CPU (Intel(R) Core(TM) i7-9750H CPU@2.60GHz). The results are summarized over 30 independent runs. |
| Researcher Affiliation | Academia | Filip Hanzely EMAIL Toyota Technological Institute at Chicago Chicago, IL 60637, USA Boxin Zhao EMAIL Mladen Kolar EMAIL The University of Chicago Booth School of Business Chicago, IL 60637, USA |
| Pseudocode | Yes | Algorithm 1 LSGD-PFL Algorithm 2 ACD-PFL Algorithm 3 ASVRCD-PFL (lifted notation) Algorithm 4 ASVRCD-PFL |
| Open Source Code | Yes | The code to reproduce the experiments is publicly available at https://github.com/boxinz17/PFL-Unified-Framework. |
| Open Datasets | Yes | We perform this experiment on synthetically generated data... across four image classification datasets MNIST (Deng, 2012), KMINIST (Clanuwat et al., 2018), FMINST (Xiao et al., 2017), and CIFAR-10 (Krizhevsky, 2009) with three objective functions (8), (11), and (14). |
| Dataset Splits | Yes | We set the number of devices M = 20. We focus on a non-i.i.d. setting of Mc Mahan et al. (2017) and Liang et al. (2020) by assigning K classes out of ten to each device. We let K = 2, 4, 8 to generate different levels of heterogeneity. A larger K means a more balanced data distribution and thus smaller data heterogeneity. We then randomly select n = 100 samples for each device based on its class assignment for training and n = 300 samples for testing. |
| Hardware Specification | Yes | The experiments on synthetic data were conducted on a personal laptop with a CPU (Intel(R) Core(TM) i7-9750H CPU@2.60GHz). The results are summarized over 30 independent runs. Experiments were conducted on a personal laptop (Intel(R) Core(TM) i7-9750H CPU@2.60GHz) with a GPU (NVIDIA Ge Force RTX 2070 with Max-Q Design). |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies (e.g., libraries or frameworks) with version numbers used for the experiments. |
| Experiment Setup | Yes | For LSGD-PFL (Algorithm 1), we set the batch size to compute the stochastic gradient B = 1, the average period τ = 5, and the learning rate η = 0.01. For p Fed Me (Algorithm 1 in Dinh et al. (2020)), we set all parameters according to the suggestions in Section 5.2 of Dinh et al. (2020). Specifically, we set the local computation rounds to R = 20, computation complexity to K = 5, Mini-Batch size to |D| = 5, and η = 0.005. We also set S = M = 20. For LSGD-PFL (Algorithm 1), we set the batch size to compute the stochastic gradient B = 1 and set the average period τ = 5. For pw in ASCD-PFL (Algorithm 6) and ASVRCD-PFL (Algorithm 7), we set it as pw = Lw/(Lβ + Lw). ... We set L = 1.0 for all objectives. We set ρ = pw/n for ASCD-PFL and ASVRCD-PFL. For η, θ2, γ, ν and θ1 in ASVRCD-PFL, we set them according to Theorem 9, where L = 2 max{Lw/pw, Lβ/pβ}, ρ = pw/n, and µ = µ /(3M). We let µ = 0.01. ... The η, ν, γ, ρ in ASCD-PFL are the same as in ASVRCD-PFL, and we let θ = min{0.8, 1/η}. In addition, we initialize all iterates at zero for all algorithms. |