reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Faster Stochastic Optimization with Arbitrary Delays via Adaptive Asynchronous Mini-Batching

Authors: Amit Attia, Ofir Gaash, Tomer Koren

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate the benefits of asynchronous mini-batching, we compare vanilla asynchronous SGD (denoted Async SGD) with a practical variant of our mini-batch method (Algorithm 1), which uses SGD, denoted Async-MB-SGD, for training a fully connected neural network on the Fashion MNIST classification dataset (Xiao et al., 2017).5 The dataset consists of 60,000 training images and 10,000 test images, each of size 28 28 pixels and labeled across 10 classes. We use test accuracy as the evaluation metric.
Researcher Affiliation	Collaboration	1Blavatnik School of Computer Science, Tel Aviv University 2Google Research Tel Aviv. Correspondence to: Amit Attia <EMAIL>.
Pseudocode	Yes	Algorithm 1: Asynchronous mini-batching Algorithm 2: Asynchronous mini-batching sweep
Open Source Code	No	The paper does not provide an explicit statement or link to the source code for the methodology described in this paper.
Open Datasets	Yes	training a fully connected neural network on the Fashion MNIST classification dataset (Xiao et al., 2017).
Dataset Splits	Yes	The dataset consists of 60,000 training images and 10,000 test images, each of size 28 28 pixels and labeled across 10 classes.
Hardware Specification	No	We adopt the two-phase asynchronous simulation framework of Cohen et al. (2021). In the first phase, we simulate compute times for each worker by drawing from a weighted mixture of two Poisson distributions. In the second phase, we simulate training by having each worker deliver gradients to a central server according to the generated compute schedule.
Software Dependencies	No	The paper mentions training a neural network and using cross-entropy loss, but does not specify any software names with version numbers (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	Each worker uses a local mini-batch of size 8. The learning rate is selected separately for each algorithm from a geometric grid with multiplicative factor 3/10: for Async-SGD we search over the range [0.001, 1.0], and for Async-MB-SGD over [0.01, 1.0]. For Async-MB-SGD, we additionally tune the aggregation batch size B (i.e., the number of updates the server accumulates before modifying the model) over the set {1, 2, 4, 8, 16, 32}. We conduct experiments with 40, 160, and 640 workers, using 7,500, 30,000, and 120,000 update steps, respectively. ... To reduce the variation of the last iterate, we use exponential moving averaging with decay 0.99.