reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Asynchronous Training Schemes in Distributed Learning with Time Delay

Authors: Haoxiang Wang, Zhanhong Jiang, Chao Liu, Soumik Sarkar, Dongxiang Jiang, Young M Lee

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For empirical validation, we demonstrate the performance of the algorithm with four deep neural network architectures on three benchmark datasets. PC-ASGD is deployed on distributed GPUs with three datasets CIFAR-10, CIFAR-100, and Tiny Image Net by using Pre Res Net110, Dense Net, Res Net20, and Efficient Net architectures. Our proposed algorithm outperforms the existing delay tolerant algorithms as well as the variants of the proposed algorithm using only the predicting step or the clipping step.
Researcher Affiliation	Collaboration	Haoxiang Wang EMAIL Department of Automation Tsinghua University; Zhanhong Jiang EMAIL Translational AI Center Iowa State University; Chao Liu EMAIL Department of Energy and Power Engineering Tsinghua University; Soumik Sarkar EMAIL Department of Mechanical Engineering Iowa State University; Dongxiang Jiang EMAIL Department of Energy and Power Engineering Tsinghua University; Young M. Lee EMAIL Johnson Controls
Pseudocode	Yes	Algorithm 1: PC-ASGD; Algorithm 2: PC-ASGD-PV
Open Source Code	No	The paper does not provide an explicit statement about releasing the code for the described methodology or a direct link to a source code repository for PC-ASGD. It mentions other tools like 'Py Syft Ryﬀel et al. (2018)' and 'pytorch-classification Yang (2019)' but these are third-party or general frameworks, not the implementation of PC-ASGD specific to this paper.
Open Datasets	Yes	CIFAR-10, CIFAR-100, and Tiny Image Net are used in the experiments following the settings in Krizhevsky (2012). The training data is randomly assigned to each agent, and the parameters of the deep learning structure are maintained within each agent and communicated with the predefined delays. The testing set is utilized for each agent to verify the performance, where our metric is the average accuracy among the agents. 6 runs are carried out for each case and the mean and variance are obtained and listed in Table 3. We also adopt our numerical studies on Tiny Image Net Le & Yang (2015) and Wind turbine data set Liu et al. (2014).
Dataset Splits	No	The paper mentions using CIFAR-10, CIFAR-100, Tiny Image Net, and a Wind turbine dataset. It states 'The training data is randomly assigned to each agent, and the parameters of the deep learning structure are maintained within each agent' and 'The testing set is utilized for each agent', but does not provide specific details on the train/validation/test splits (e.g., percentages, sample counts, or references to standard splits for these datasets) that would be needed for reproduction.
Hardware Specification	Yes	Our experiments are implemented and evaluated at GTX-1080 ti with Intel Xenon 2.55GHz processor with 32GB RAM.
Software Dependencies	No	The paper mentions 'pytorch-classification Yang (2019)' which implies the use of PyTorch, but it does not specify any version numbers for PyTorch or other software libraries/dependencies. Appendix C refers to 'Model Settings' and 'Hardware environment' but only lists hardware details, not software versions.
Experiment Setup	Yes	The batch size is selected as 128. After hyperparameter searching in (0.1, 0.01, 0.001), the learning rate is set as 0.01 for the first 160 epochs and changed to 0.001. The decays are applied in epochs (80, 120, 160, 200). The approximation coefficient λ is set as 1. λ = 0.001 is first tried as suggested by DC-ASGD Zheng et al. (2017) and the results show that the predicting step doesn’t affect the training process. By considering the upper bound of 1, a set of values (0.001, 0.1, 1) are tried, and λ = 1 is applied according to the performance.