reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Authors: Xiangyi Chen, Tiancong Chen, Haoran Sun, Steven Z. Wu, Mingyi Hong

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we show how adding noise helps the practical behavior of the algorithms. Since SIGNSGD is better studied empirically and MEDIANSGD is more of theoretical interest so far, we use SIGNSGD to demonstrate the beneﬁt of injecting noise. We conduct experiments on MNIST and CIFAR-10 datasets.
Researcher Affiliation	Academia	Xiangyi Chen University of Minnesota EMAIL Tiancong Chen University of Minnesota EMAIL Haoran Sun University of Minnesota EMAIL Zhiwei Steven Wu Carnegie Mellon University EMAIL Mingyi Hong University of Minnesota EMAIL
Pseudocode	Yes	Algorithm 1 SIGNSGD (with M nodes), Algorithm 2 MEDIANSGD (with M nodes), Algorithm 3 Noisy SIGNSGD, Algorithm 4 Noisy MEDIANSGD
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We conduct experiments on MNIST and CIFAR-10 datasets.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	We conduct experiments on MNIST and CIFAR-10 datasets. For both datasets, the data distribution on each node is heterogeneous, more speciﬁcally, each node contains some exclusive data for one or two out of ten categories. More details about the experiment conﬁguration can be found in Appendix I. For the noisy algorithms we use b = 0.001. The sudden change of performance is caused by learning rate decay, which happens at 1000/3000/5000 iterations.