reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How iteration composition influences convergence and stability in deep learning

Authors: Benoit Dherin, Benny Avelin, Anders Karlsson, Hanna Mazzawi, Javier Gonzalvo, Michael Munn

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical analysis shows that in contractive regions (e.g., around minima) backward-SGD converges to a point while the standard forward-SGD generally only converges to a distribution. This leads to improved stability and convergence which we demonstrate experimentally. Our experiments provide a proof of concept supporting this phenomenon.
Researcher Affiliation	Collaboration	Benoit Dherin EMAIL Google Research Benny Avelin EMAIL Department of Mathematics, Uppsala University Anders Karlsson EMAIL Department of Mathematics, University of Geneva and Uppsala University Hanna Mazzawi EMAIL Google Research Javier Gonzalvo EMAIL Google Research Michael Munn EMAIL Google Research
Pseudocode	No	The paper describes the algorithms (forward and backward SGD) conceptually and mathematically (e.g., θn = Tn Tn 1 T1(θ)) but does not provide a distinct, structured pseudocode block or algorithm listing.
Open Source Code	No	We defer engineering applications leveraging this phenomenon (like efficient implementations of the backward SGD) to future work, while outlining a few potential directions at the paper s conclusion.
Open Datasets	Yes	We trained a Res Net-18 with stochastic gradient descent and no regularization on the CIFAR-10 dataset Krizhevsky (2009). ... MLP trained on Fashion MNIST Xiao et al. (2017). ... Res Net-50 model He et al. (2016) using both forward and backward stochastic gradient descent with no regularization on the CIFAR-100 dataset Krizhevsky (2009).
Dataset Splits	No	The paper mentions training on datasets like CIFAR-10, Fashion MNIST, and CIFAR-100 but does not explicitly state the training, validation, or test split percentages or sample counts used for these datasets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using Adam W as an optimizer but does not specify any software libraries (e.g., TensorFlow, PyTorch) or their version numbers, nor any other relevant software dependencies with versions.
Experiment Setup	Yes	We used a learning rate of 0.025 and a batch-size of 8. ... The batch-size was set to 8 while the learning rate was 0.001. ... learning rate of 0.001 and a batch-size of 8. ... learning rate of 0.001 and a batch-size of 16. ... learning rate of 0.00025 and a batch-size of 8.