On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

Authors: Jianyu Wang, Rudrajit Das, Gauri Joshi, Satyen Kale, Zheng Xu, Tong Zhang

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By performing experiments on naturally heterogeneous federated datasets, we show that previous theoretical predictions do not align well with practice. Fed Avg can have nearly identical performance on both IID and non-IID versions of these datasets. Thus, previous worst-case analyses may be too pessimistic for such datasets. [...] We conduct some experiments on Stack Overflow, a naturally non-IID split dataset for next-word prediction. [...] In Figure 3, we first run mini-batch SGD on Federated EMNIST (FEMNIST) (Mc Mahan et al., 2017) and Stack Overflow Next Word Prediction datasets (Reddi et al., 2019) to obtain an approximation for the optimal model w *. Then we evaluate the average drift at optimum ρ = Ec Bc(w *) and its upper bound as given in (7) on these datasets.
Researcher Affiliation Collaboration Jianyu Wang EMAIL Carnegie Mellon University; Rudrajit Das EMAIL University of Texas at Austin; Gauri Joshi EMAIL Carnegie Mellon University; Satyen Kale EMAIL Google Research; Zheng Xu EMAIL Google Research; Tong Zhang EMAIL University of Illinois Urbana-Champaign
Pseudocode No The paper describes the Federated Averaging algorithm in detail in Section 2, including its update rule (Equation 2), but does not present it in a structured pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes In Figure 3, we first run mini-batch SGD on Federated EMNIST (FEMNIST) (Mc Mahan et al., 2017) and Stack Overflow Next Word Prediction datasets (Reddi et al., 2019) to obtain an approximation for the optimal model w *. Then we evaluate the average drift at optimum ρ = Ec Bc(w *) and its upper bound as given in (7) on these datasets. [...] We run the same set of experiments on a non-IID CIFAR-100 dataset.
Dataset Splits Yes From the naturally heterogeneous Stack Overflow dataset, we create its IID version by aggregating and shuffling the data from all clients, and then re-assigning the IID data back to clients. [...] For example, each client may only hold one or very few classes of data (Zhao et al., 2018), or has data for all classes but the amount of each class is randomly drawn from a Dirichlet distribution (Hsu et al., 2019).
Hardware Specification No The paper provides details on the models (Conv Net, LSTM), loss functions, number of clients, local optimizer, and local learning rates in Table 3, but does not specify any hardware components like GPU or CPU models used for running the experiments.
Software Dependencies No The paper states 'we strictly follow the training setup given in Reddi et al. (2020)' for experiments on FEMNIST, Stack Overflow, and CIFAR-100 datasets, but it does not explicitly list any specific software dependencies or their version numbers.
Experiment Setup Yes In Table 3, the paper provides specific experimental details for FEMNIST, Stack Overflow, and CIFAR-100 datasets, including the model type (Conv Net, LSTM), loss function (Cross-Entropy), number of clients (500, 1000, 200), local optimizer (GD), and local learning rate (0.1, 0.5).