Locally Adaptive Federated Learning

Authors: Sohom Mukherjee, Nicolas Loizou, Sebastian U Stich

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theoretical claims by performing illustrative experiments for both i.i.d. non-i.i.d. cases. Our proposed algorithms match the optimization performance of tuned Fed Avg in the convex setting, outperform Fed Avg as well as state-of-the-art adaptive federated algorithms like Fed AMS for non-convex experiments, and come with superior generalization performance.
Researcher Affiliation Academia Sohom Mukherjee EMAIL Julius-Maximilians-Universität Würzburg Nicolas Loizou EMAIL Johns Hopkins University Sebastian U. Stich EMAIL CISPA Helmholtz Center for Information Security
Pseudocode Yes Algorithm 1 Fed SPS: Federated averaging with fully locally adaptive stepsizes. Algorithm 2 Fed SPS-Normalized: Fully locally adaptive Fed SPS with normalization to account for heterogeneity as suggested in Wang et al. (2021). Algorithm 3 Fed SPS-Global: Fed SPS with global stepsize aggregation
Open Source Code No Our code is based on publicly available repositories for SPS and Fed AMS1, and will be made available upon acceptance.
Open Datasets Yes For non-i.i.d. experiments with the MNIST dataset (Le Cun et al., 2010), we assign every client samples from exactly two classes of the dataset, the splits being non-overlapping and balanced with each client having same number of samples (Li et al., 2020b). For non-i.i.d. experiments with the CIFAR-10/CIFAR-100 datasets, we use the Dirichlet distribution over classes following the proposal in Hsu et al. (2019). The convex comparison also mentions LIBSVM Chang & Lin (2011) datasets (w8a, mushrooms, ijcnn, phishing, a9a).
Dataset Splits No The paper describes how data is distributed among clients for i.i.d. and non-i.i.d. scenarios (e.g., equally splitting data, assigning samples from two classes for MNIST, Dirichlet distribution for CIFAR-10/100). However, it does not explicitly provide the global training, validation, or test split percentages or sample counts for any of these datasets. It only implies their existence through mentions of 'Train Loss' and 'Test Accuracy'.
Hardware Specification No The acknowledgments section mentions: 'This work was supported by the Helmholtz Association s Initiative and Networking Fund on the HAICORE@FZJ partition.' While this indicates the use of a computing resource, no specific hardware details such as GPU models, CPU models, or memory specifications are provided.
Software Dependencies No Our code is based on publicly available repositories for SPS and Fed AMS1, and will be made available upon acceptance.
Experiment Setup Yes For all federated training experiments we have 500 communication rounds (the no. of communication rounds being T/τ as per our notation), 5 local steps on each client (τ = 5, unless otherwise specified for some ablation experiments), and a batch size of 20 (|B| = 20). We fix c = 0.5 for all further experiments. Similarly we fix γb = 1, following similar observations as before. We provide details on hyperparameter tuning for Fed Avg and Fed AMS including client learning rate ηl {0.0001, 0.001, 0.01, 0.1, 1.0}, server learning rate η {0.001, 0.01, 0.1, 1.0}, β1 = 0.9, β2 = 0.99, and max stabilization factor ϵ {10-8, 10-4, 10-3, 10-2, 10-1, 100}.