Gradient Masked Averaging for Federated Learning
Authors: Irene Tenison, Sai Aravind Sreeramadas, Vaikkunth Mugunthan, Edouard Oyallon, Irina Rish, Eugene Belilovsky
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on multiple FL algorithms with in-distribution, realworld, feature-skewed out-of-distribution, and quantity imbalanced datasets and show that it provides consistent improvements, particularly in the case of heterogeneous clients. In this section, we show empirically that the proposed GMA tends to outperform standard aggregation, converging at similar or better than standard aggregation, while enhancing the global model generalization. |
| Researcher Affiliation | Academia | 1Massachussetts Institute of Technology, MA, USA, 2Universite de Montreal, Quebec, Canada, 3Mila, Quebec AI Institute, Canada, 4Sorbonne University, Paris, France, 5Concordia University, Quebec, Canada |
| Pseudocode | Yes | Algorithm 1 Gradient Masked Fed AVG Mc Mahan et al. (2017) Server Executes: Initialize w0 Rd randomly for each server epoch, t = 1, 2, 3, ..., T do |
| Open Source Code | Yes | Code for our experiments is include in the supplementary materials and will be made available at the time of publication. The code is available at https://github.com/arvi797/FL. |
| Open Datasets | Yes | Details of the datasets explored and the respective skews induced is summarised in Table 1. The datasets mentioned include MNIST, FMNIST, CIFAR10, CIFAR100, Tiny Image Net, FEMNIST, and Fed CMNIST. |
| Dataset Splits | Yes | Our experiment include label distribution skew, feature distribution skew, quantity skew, real-world data distribution, and mixed (label and feature) skew. The test data consists of data from the same domain as the train dataset distributed across clients but from one user (or a set of users) not included in the set of train clients. To simulate this quantity imbalance, we have used a Dirichlet distribution based quantity skew with β = 0.5 as in Li et al. (2021a) on CIFAR-10 across 100 and 10 clients with 10 clients participating in each communication round. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) are provided in the paper. The paper only refers to clients and servers in a general federated learning context. |
| Software Dependencies | No | An SGD optimizer with a momentum (ρ = 0.9) and cross-entropy loss was used to train each client before aggregation at the server in all our experiments. The momentum parameters of adaptive federated optimizers are fixed at β1 = 0.9 and β2 = 0.99 as per Reddi et al. (2021). For our experiments, we used CIFAR-10 and Res Net models. No specific software versions for libraries or programming languages are mentioned. |
| Experiment Setup | Yes | An SGD optimizer with a momentum (ρ = 0.9) and cross-entropy loss was used to train each client before aggregation at the server in all our experiments. The momentum parameters of adaptive federated optimizers are fixed at β1 = 0.9 and β2 = 0.99 as per Reddi et al. (2021). For each of the considered algorithms we tune the local client model learning rates, global model learning rates, tau, and number of local epochs (CIFAR and Tiny Image Net) to consider the best performances of the algorithms. More details on the hyperparameter tuning is given in Appendix. The grid was fixed the same for all datasets and algorithms... ηl {10-3, 10-2, 5.10-2, 10-1} ηg {10-2, 10-1, 1, 1.5, 2} τ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. Throughout the experiments we have maintained τ = 0.4. |