Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg

Authors: Like Jian, Dong Liu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate our theoretical findings across various network architectures, loss functions, and optimization methods. (From Abstract) ... 6. Numerical Experiments (Section title)
Researcher Affiliation Academia 1School of Cyber Science and Technology, Beihang University, Beijing, China. Correspondence to: Dong Liu <EMAIL>.
Pseudocode No The paper describes theoretical proofs and numerical experiments but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1Codes to reproduce the main results are available at https://github.com/kkhuge/ICML2025.
Open Datasets Yes We conduct experiments on two widely used image classification datasets: MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al.).
Dataset Splits Yes To partition the datasets among different clients and generate non-IID data, we follow the approach proposed by Hsu et al. (2019)... The mini MNIST dataset is created by randomly selecting two classes from the MNIST dataset, followed by randomly sampling 50 images from each class for the training set and 10 images from each class for the test set. A similar approach is used to generate the mini-CIFAR-10 dataset, which contains 500 training images and 100 test images.
Hardware Specification No The paper describes experimental setups including datasets, network architectures, loss functions, and optimization methods, but it does not provide any specific hardware details such as GPU/CPU models or processor types used for running the experiments.
Software Dependencies No The paper mentions using SGD, cross-entropy loss, and MSE loss, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes Although our theoretical analysis is based on GD with learning rate η = O(n 1), to show our conclusions can be extended to more practical settings, we use SGD with batch size 64 and set a common learning rate η = 0.1 with a weight decay of 0.0005... Each global round consists of τ = 5 local SGD iterations, σW = 1, σb = 0.1. (From Figure 2 caption).