reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Faster Convergence of Local SGD for Over-Parameterized Models

Authors: Tiancheng Qin, S. Rasoul Etesami, Cesar A Uribe

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we validate our theoretical results by performing large-scale numerical experiments that reveal the convergence behavior of Local SGD for practical over-parameterized deep learning models, in which the O(1/T) convergence rate of Local SGD is clearly shown.
Researcher Affiliation	Academia	Tiancheng Qin EMAIL Department of Industrial and Systems Engineering, Coordinated Science Laboratory University of Illinois at Urbana-Champaign S. Rasoul Etesami EMAIL Department of Industrial and Systems Engineering, Coordinated Science Laboratory University of Illinois at Urbana-Champaign Cesár A. Uribe EMAIL Department of Electrical and Computer Engineering Rice University
Pseudocode	Yes	The pseudo-code for the Local SGD algorithm is provided in Algorithm 1.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	Yes	We distribute the Cifar10 dataset (Krizhevsky et al., 2009) to n = 20 nodes and apply Local SGD to train a Res Net18 neural network (He et al., 2016) on the Cifar10 dataset (Krizhevsky et al., 2009).
Dataset Splits	Yes	We first sort the data by their label, then divide the dataset into 20 shards and assign each of 20 nodes 1 shard. In this way, ten nodes will have image examples of one label, and ten nodes will have image examples of two labels. This regime leads to highly heterogeneous datasets among nodes. ... We partition the dataset in three different ways to reflect different data similarity regimes and evaluate the relationship between training loss, communication rounds, and local steps for Local SGD under each of the three regimes.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using Layer Normalization instead of Batch Normalization within the ResNet18 architecture, but does not specify any software libraries or packages with version numbers for reproducibility.
Experiment Setup	Yes	For this set of experiments, we run the Local SGD algorithm for R = 20000 communication rounds with a different number of local steps per communication round K = 1, 2, 5, 10, 20... We use a training batch size of 8 and choose stepsize η to be 0.1... We stop the algorithm after at most 106 communication rounds or if the training loss is below 10 4. We choose stepsize η = 0.075.