Convergence Analysis of Federated Learning Methods Using Backward Error Analysis
Authors: Jinwoo Lim, Suhyun Kim, Soo-Mook Moon
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical analysis One way to inspect the effect of the dispersion term is to compare the convergence behaviour of Fed Avg with and without the dispersion term. In order to empirically check the effect of the dispersion term, we manually removed the dispersion term from the modified loss. The algorithm for removing the dispersion term is in the Appendix. Empirical analysis on the dispersion term. We run experiments for evaluation of our analysis. To evaluate only the effect of the dispersion term, we run experiments with a simple CNN model on a simple dataset, MNIST (Le Cun et al. 1998) and a relatively more complex dataset, FEMNIST (Caldas et al. 2018). Experiments were done on a non-IID environment of Dirichlet distribution with parameter 0.2, except for FEMNIST, which is naturally non-IID. The batch size was 30 for MNIST and 100 for FEMNIST. |
| Researcher Affiliation | Academia | Jinwoo Lim1, Suhyun Kim2*, Soo-Mook Moon1 1Seoul National University 2Korea Institute of Science and Technology EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Pseudocode for Fed Avg is in Appendix. We first define the necessary variables below, assuming that Fed Avg runs with full participation of the clients. |
| Open Source Code | No | The paper does not provide concrete access to source code. It mentions pseudocode in the appendix but no links to a repository or explicit statements about code availability. |
| Open Datasets | Yes | To evaluate only the effect of the dispersion term, we run experiments with a simple CNN model on a simple dataset, MNIST (Le Cun et al. 1998) and a relatively more complex dataset, FEMNIST (Caldas et al. 2018). We experimented with Fed SAM on MNIST and Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017) on non-IID with full client participation. We ran experiments on a rather complex dataset, CIFAR-10 (Krizhevsky, Hinton et al. 2009), for a model with residual connections to evaluate the impact of high-order terms of the implicit regularizer |
| Dataset Splits | No | The paper mentions non-IID environments using Dirichlet distributions (e.g., "non-IID environment of Dirichlet distribution with parameter 0.2" for MNIST/FEMNIST, and "Data is non-IID with a Dirichlet distribution of parameter 0.05" for CIFAR-10). It does not explicitly state specific train/test/validation percentages or sample counts for these datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | To fully observe the effect under the gradient flow, we used a normal SGD optimizer with a small learning rate of 0.001 with no momentum and learning rate decay. More details on the experimental settings such as the model architecture and the learning rate are in the Appendix. The batch size was 30 for MNIST and 100 for FEMNIST. ... 100 clients were trained with a learning rate of 0.001, 3 local epochs, and the batch size of 300. |