Asynchronous Training Schemes in Distributed Learning with Time Delay

Authors: Haoxiang Wang, Zhanhong Jiang, Chao Liu, Soumik Sarkar, Dongxiang Jiang, Young M Lee

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For empirical validation, we demonstrate the performance of the algorithm with four deep neural network architectures on three benchmark datasets. PC-ASGD is deployed on distributed GPUs with three datasets CIFAR-10, CIFAR-100, and Tiny Image Net by using Pre Res Net110, Dense Net, Res Net20, and Efficient Net architectures. Our proposed algorithm outperforms the existing delay tolerant algorithms as well as the variants of the proposed algorithm using only the predicting step or the clipping step.
Researcher Affiliation Collaboration Haoxiang Wang EMAIL Department of Automation Tsinghua University; Zhanhong Jiang EMAIL Translational AI Center Iowa State University; Chao Liu EMAIL Department of Energy and Power Engineering Tsinghua University; Soumik Sarkar EMAIL Department of Mechanical Engineering Iowa State University; Dongxiang Jiang EMAIL Department of Energy and Power Engineering Tsinghua University; Young M. Lee EMAIL Johnson Controls
Pseudocode Yes Algorithm 1: PC-ASGD; Algorithm 2: PC-ASGD-PV
Open Source Code No The paper does not provide an explicit statement about releasing the code for the described methodology or a direct link to a source code repository for PC-ASGD. It mentions other tools like 'Py Syft Ryffel et al. (2018)' and 'pytorch-classification Yang (2019)' but these are third-party or general frameworks, not the implementation of PC-ASGD specific to this paper.
Open Datasets Yes CIFAR-10, CIFAR-100, and Tiny Image Net are used in the experiments following the settings in Krizhevsky (2012). The training data is randomly assigned to each agent, and the parameters of the deep learning structure are maintained within each agent and communicated with the predefined delays. The testing set is utilized for each agent to verify the performance, where our metric is the average accuracy among the agents. 6 runs are carried out for each case and the mean and variance are obtained and listed in Table 3. We also adopt our numerical studies on Tiny Image Net Le & Yang (2015) and Wind turbine data set Liu et al. (2014).
Dataset Splits No The paper mentions using CIFAR-10, CIFAR-100, Tiny Image Net, and a Wind turbine dataset. It states 'The training data is randomly assigned to each agent, and the parameters of the deep learning structure are maintained within each agent' and 'The testing set is utilized for each agent', but does not provide specific details on the train/validation/test splits (e.g., percentages, sample counts, or references to standard splits for these datasets) that would be needed for reproduction.
Hardware Specification Yes Our experiments are implemented and evaluated at GTX-1080 ti with Intel Xenon 2.55GHz processor with 32GB RAM.
Software Dependencies No The paper mentions 'pytorch-classification Yang (2019)' which implies the use of PyTorch, but it does not specify any version numbers for PyTorch or other software libraries/dependencies. Appendix C refers to 'Model Settings' and 'Hardware environment' but only lists hardware details, not software versions.
Experiment Setup Yes The batch size is selected as 128. After hyperparameter searching in (0.1, 0.01, 0.001), the learning rate is set as 0.01 for the first 160 epochs and changed to 0.001. The decays are applied in epochs (80, 120, 160, 200). The approximation coefficient λ is set as 1. λ = 0.001 is first tried as suggested by DC-ASGD Zheng et al. (2017) and the results show that the predicting step doesn’t affect the training process. By considering the upper bound of 1, a set of values (0.001, 0.1, 1) are tried, and λ = 1 is applied according to the performance.