Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg
Authors: Like Jian, Dong Liu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate our theoretical findings across various network architectures, loss functions, and optimization methods. (From Abstract) ... 6. Numerical Experiments (Section title) |
| Researcher Affiliation | Academia | 1School of Cyber Science and Technology, Beihang University, Beijing, China. Correspondence to: Dong Liu <EMAIL>. |
| Pseudocode | No | The paper describes theoretical proofs and numerical experiments but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Codes to reproduce the main results are available at https://github.com/kkhuge/ICML2025. |
| Open Datasets | Yes | We conduct experiments on two widely used image classification datasets: MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al.). |
| Dataset Splits | Yes | To partition the datasets among different clients and generate non-IID data, we follow the approach proposed by Hsu et al. (2019)... The mini MNIST dataset is created by randomly selecting two classes from the MNIST dataset, followed by randomly sampling 50 images from each class for the training set and 10 images from each class for the test set. A similar approach is used to generate the mini-CIFAR-10 dataset, which contains 500 training images and 100 test images. |
| Hardware Specification | No | The paper describes experimental setups including datasets, network architectures, loss functions, and optimization methods, but it does not provide any specific hardware details such as GPU/CPU models or processor types used for running the experiments. |
| Software Dependencies | No | The paper mentions using SGD, cross-entropy loss, and MSE loss, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | Although our theoretical analysis is based on GD with learning rate η = O(n 1), to show our conclusions can be extended to more practical settings, we use SGD with batch size 64 and set a common learning rate η = 0.1 with a weight decay of 0.0005... Each global round consists of τ = 5 local SGD iterations, σW = 1, σb = 0.1. (From Figure 2 caption). |