Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
Authors: Xiangyi Chen, Tiancong Chen, Haoran Sun, Steven Z. Wu, Mingyi Hong
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we show how adding noise helps the practical behavior of the algorithms. Since SIGNSGD is better studied empirically and MEDIANSGD is more of theoretical interest so far, we use SIGNSGD to demonstrate the benefit of injecting noise. We conduct experiments on MNIST and CIFAR-10 datasets. |
| Researcher Affiliation | Academia | Xiangyi Chen University of Minnesota EMAIL Tiancong Chen University of Minnesota EMAIL Haoran Sun University of Minnesota EMAIL Zhiwei Steven Wu Carnegie Mellon University EMAIL Mingyi Hong University of Minnesota EMAIL |
| Pseudocode | Yes | Algorithm 1 SIGNSGD (with M nodes), Algorithm 2 MEDIANSGD (with M nodes), Algorithm 3 Noisy SIGNSGD, Algorithm 4 Noisy MEDIANSGD |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | We conduct experiments on MNIST and CIFAR-10 datasets. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We conduct experiments on MNIST and CIFAR-10 datasets. For both datasets, the data distribution on each node is heterogeneous, more specifically, each node contains some exclusive data for one or two out of ten categories. More details about the experiment configuration can be found in Appendix I. For the noisy algorithms we use b = 0.001. The sudden change of performance is caused by learning rate decay, which happens at 1000/3000/5000 iterations. |