Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity

Authors: Arto Maranjyan, Alexander Tyurin, Peter Richtárik

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using numerical experiments, we demonstrate that Ringmaster ASGD outperforms existing methods (see Section G). The obtained result confirms that Ringmaster ASGD is indeed faster than Delay-Adaptive ASGD and Rennala SGD in the considered setting. One can see the numerical experiments support that our theoretical results, and we significantly improve the convergence rate of the previous version of Asynchronous SGD (Delay-Adaptive ASGD). We run an experiment on a small 2-layer neural network with Re LU activation on the MNIST dataset, showing that our method, Ringmaster ASGD, is more robust and outperforms Delay-Adaptive ASGD and Rennala SGD.
Researcher Affiliation Academia 1King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 2AIRI, Moscow, Russia 3Skolkovo Institute of Science and Technology, Moscow, Russia.
Pseudocode Yes Algorithm 1 Asynchronous SGD, Algorithm 2 Rennala SGD, Algorithm 3 Naive Optimal ASGD, Algorithm 4 Ringmaster ASGD (without calculation stops), Algorithm 5 Ringmaster ASGD (with calculation stops)
Open Source Code No The paper discusses the implementation and performance of algorithms through experiments but does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets Yes To show that our method also works well for neural networks, we trained a small 20-layer neural network with Re LU activation on the MNIST dataset (Le Cun et al., 1998).
Dataset Splits No The paper mentions training on the MNIST dataset but does not specify the training, validation, or test splits used for the experiments.
Hardware Specification Yes The distributed environment was emulated on machines with Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz.
Software Dependencies No The experiments were implemented in Python. No specific version number for Python or any other software libraries with their versions are mentioned.
Experiment Setup Yes We tuned the stepsize from the set {5p : p [ 5, 5]}. Both the batch size for Rennala SGD and the delay threshold for Ringmaster ASGD were tuned from the set { n/4p : p N0}. We set d = 1729 and n = 6174. We trained a small 20-layer neural network with Re LU activation on the MNIST dataset.