Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization
Authors: Tianyi Liu, Shiyang Li, Jianping Shi, Enlu Zhou, Tuo Zhao
NeurIPS 2018 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on both streaming PCA and training deep neural networks are provided to support our findings for Async-MSGD. |
| Researcher Affiliation | Collaboration | Tianyi Liu School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 EMAIL Shiyang Li Harbin Institue of Technology EMAIL Jianping Shi Sensetime Group Limited EMAIL Enlu Zhou School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 EMAIL Tuo Zhao School of Industrial and System Engineering Georgia Institute of Technology Atlanta, GA 30332 EMAIL |
| Pseudocode | No | The paper describes the Async-MSGD algorithm using mathematical equations (e.g., Equation 3, Equation 5) but does not provide a structured pseudocode block. |
| Open Source Code | No | The paper does not explicitly state that the source code for the methodology described is publicly available or provide a link to a repository. |
| Open Datasets | Yes | training a 32-layer hyperspherical residual neural network (Sphere Res Net34) using the CIFAR-100 dataset for a 100-class image classification task. |
| Dataset Splits | Yes | 50k images are used for training, and the rest 10k are used for testing. |
| Hardware Specification | Yes | We use a computer workstation with 8 Titan XP GPUs. |
| Software Dependencies | No | The paper mentions software like 'deep neural networks' but does not provide specific version numbers for any libraries, frameworks, or solvers used (e.g., TensorFlow, PyTorch, scikit-learn versions). |
| Experiment Setup | Yes | We choose a batch size of 128. We choose the initial step size as 0.2. We decrease the step size by a factor of 0.2 after 60, 120, and 160 epochs. The momentum parameter is tuned over {0.1, 0.3, 0.5, 0.7, 0.9}. |