reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Convergence Analysis of Federated Learning Methods Using Backward Error Analysis

Authors: Jinwoo Lim, Suhyun Kim, Soo-Mook Moon

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical analysis One way to inspect the effect of the dispersion term is to compare the convergence behaviour of Fed Avg with and without the dispersion term. In order to empirically check the effect of the dispersion term, we manually removed the dispersion term from the modified loss. The algorithm for removing the dispersion term is in the Appendix. Empirical analysis on the dispersion term. We run experiments for evaluation of our analysis. To evaluate only the effect of the dispersion term, we run experiments with a simple CNN model on a simple dataset, MNIST (Le Cun et al. 1998) and a relatively more complex dataset, FEMNIST (Caldas et al. 2018). Experiments were done on a non-IID environment of Dirichlet distribution with parameter 0.2, except for FEMNIST, which is naturally non-IID. The batch size was 30 for MNIST and 100 for FEMNIST.
Researcher Affiliation	Academia	Jinwoo Lim1, Suhyun Kim2*, Soo-Mook Moon1 1Seoul National University 2Korea Institute of Science and Technology EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Pseudocode for Fed Avg is in Appendix. We first define the necessary variables below, assuming that Fed Avg runs with full participation of the clients.
Open Source Code	No	The paper does not provide concrete access to source code. It mentions pseudocode in the appendix but no links to a repository or explicit statements about code availability.
Open Datasets	Yes	To evaluate only the effect of the dispersion term, we run experiments with a simple CNN model on a simple dataset, MNIST (Le Cun et al. 1998) and a relatively more complex dataset, FEMNIST (Caldas et al. 2018). We experimented with Fed SAM on MNIST and Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017) on non-IID with full client participation. We ran experiments on a rather complex dataset, CIFAR-10 (Krizhevsky, Hinton et al. 2009), for a model with residual connections to evaluate the impact of high-order terms of the implicit regularizer
Dataset Splits	No	The paper mentions non-IID environments using Dirichlet distributions (e.g., "non-IID environment of Dirichlet distribution with parameter 0.2" for MNIST/FEMNIST, and "Data is non-IID with a Dirichlet distribution of parameter 0.05" for CIFAR-10). It does not explicitly state specific train/test/validation percentages or sample counts for these datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	To fully observe the effect under the gradient flow, we used a normal SGD optimizer with a small learning rate of 0.001 with no momentum and learning rate decay. More details on the experimental settings such as the model architecture and the learning rate are in the Appendix. The batch size was 30 for MNIST and 100 for FEMNIST. ... 100 clients were trained with a learning rate of 0.001, 3 local epochs, and the batch size of 300.