reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Iterate Averaging in the Quest for Best Test Error

Authors: Diego Granziol, Nicholas P. Baskerville, Xingchen Wan, Samuel Albanie, Stephen Roberts

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We derive three phenomena from our theoretical results... Inspired by these results, together with empirical investigations... We showcase the efficacy of our approach on the CIFAR-10/100, Image Net and Penn Treebank datasets on a variety of modern and classical network architectures. Section 6 is titled 'Experiments' and Section 7 is titled 'Ablation Studies and Additional Experiments'.
Researcher Affiliation	Collaboration	Diego Granziol EMAIL Machine Learning Research Group University of Oxford, Oxford, UK; Nicholas P. Baskerville EMAIL School of Mathematics University of Bristol, Bristol, UK; Xingchen Wan EMAIL Machine Learning Research Group University of Oxford, Oxford, UK; Samuel Albanie EMAIL Department of Engineering University of Cambridge, Cambridge, UK; Stephen Roberts EMAIL Machine Learning Research Group University of Oxford, Oxford, UK. The email domain 'purestrength.ai' suggests an industry affiliation, while the other authors are affiliated with universities.
Pseudocode	Yes	Algorithm 1 Gadam/Gadam X
Open Source Code	No	The paper does not provide a specific repository link, an explicit statement of code release for the described methodology, or mention that the code is included in supplementary materials. Mentions of other GitHub repositories are for third-party tools or codebases used by the authors.
Open Datasets	Yes	We showcase the efficacy of our approach on the CIFAR-10/100, Image Net and Penn Treebank datasets on a variety of modern and classical network architectures. These are well-known, publicly available datasets, with citations such as (Krizhevsky et al., 2009) for CIFAR, (Russakovsky et al., 2015) for ImageNet, and (Marcus et al., 1993) for Penn Treebank.
Dataset Splits	No	The paper mentions using specific datasets and a batch size of 128, and 'standard data augmentation'. However, it does not explicitly state the percentages or absolute counts for training, validation, and test splits for the datasets used, nor does it cite specific predefined splits for each dataset.
Hardware Specification	Yes	We always use a single GPU for any single run of experiment. We use one of the three possible GPUs for our experiment: NVIDIA Ge Force GTX 1080 Ti, Ge Force RTX 2080 Ti or Tesla V100.
Software Dependencies	Yes	Unless otherwise stated, all experiments are run with Py Torch 1.1 on Python 3.7 Anaconda environment with GPU acceleration.
Experiment Setup	Yes	The paper provides detailed learning rate schedules for experiments with and without iterate averaging, hyperparameter tuning ranges for learning rates and weight decay for CIFAR and Image Net experiments, and specific values for momentum parameters (β = 0.9 for SGD, {β1, β2} = {0.9, 0.999} for Adam and variants), epsilon (ϵ = 10 8), and batch size (128).