Locally Adaptive Federated Learning
Authors: Sohom Mukherjee, Nicolas Loizou, Sebastian U Stich
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theoretical claims by performing illustrative experiments for both i.i.d. non-i.i.d. cases. Our proposed algorithms match the optimization performance of tuned Fed Avg in the convex setting, outperform Fed Avg as well as state-of-the-art adaptive federated algorithms like Fed AMS for non-convex experiments, and come with superior generalization performance. |
| Researcher Affiliation | Academia | Sohom Mukherjee EMAIL Julius-Maximilians-Universität Würzburg Nicolas Loizou EMAIL Johns Hopkins University Sebastian U. Stich EMAIL CISPA Helmholtz Center for Information Security |
| Pseudocode | Yes | Algorithm 1 Fed SPS: Federated averaging with fully locally adaptive stepsizes. Algorithm 2 Fed SPS-Normalized: Fully locally adaptive Fed SPS with normalization to account for heterogeneity as suggested in Wang et al. (2021). Algorithm 3 Fed SPS-Global: Fed SPS with global stepsize aggregation |
| Open Source Code | No | Our code is based on publicly available repositories for SPS and Fed AMS1, and will be made available upon acceptance. |
| Open Datasets | Yes | For non-i.i.d. experiments with the MNIST dataset (Le Cun et al., 2010), we assign every client samples from exactly two classes of the dataset, the splits being non-overlapping and balanced with each client having same number of samples (Li et al., 2020b). For non-i.i.d. experiments with the CIFAR-10/CIFAR-100 datasets, we use the Dirichlet distribution over classes following the proposal in Hsu et al. (2019). The convex comparison also mentions LIBSVM Chang & Lin (2011) datasets (w8a, mushrooms, ijcnn, phishing, a9a). |
| Dataset Splits | No | The paper describes how data is distributed among clients for i.i.d. and non-i.i.d. scenarios (e.g., equally splitting data, assigning samples from two classes for MNIST, Dirichlet distribution for CIFAR-10/100). However, it does not explicitly provide the global training, validation, or test split percentages or sample counts for any of these datasets. It only implies their existence through mentions of 'Train Loss' and 'Test Accuracy'. |
| Hardware Specification | No | The acknowledgments section mentions: 'This work was supported by the Helmholtz Association s Initiative and Networking Fund on the HAICORE@FZJ partition.' While this indicates the use of a computing resource, no specific hardware details such as GPU models, CPU models, or memory specifications are provided. |
| Software Dependencies | No | Our code is based on publicly available repositories for SPS and Fed AMS1, and will be made available upon acceptance. |
| Experiment Setup | Yes | For all federated training experiments we have 500 communication rounds (the no. of communication rounds being T/τ as per our notation), 5 local steps on each client (τ = 5, unless otherwise specified for some ablation experiments), and a batch size of 20 (|B| = 20). We fix c = 0.5 for all further experiments. Similarly we fix γb = 1, following similar observations as before. We provide details on hyperparameter tuning for Fed Avg and Fed AMS including client learning rate ηl {0.0001, 0.001, 0.01, 0.1, 1.0}, server learning rate η {0.001, 0.01, 0.1, 1.0}, β1 = 0.9, β2 = 0.99, and max stabilization factor ϵ {10-8, 10-4, 10-3, 10-2, 10-1, 100}. |