Iterate Averaging in the Quest for Best Test Error

Authors: Diego Granziol, Nicholas P. Baskerville, Xingchen Wan, Samuel Albanie, Stephen Roberts

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We derive three phenomena from our theoretical results... Inspired by these results, together with empirical investigations... We showcase the efficacy of our approach on the CIFAR-10/100, Image Net and Penn Treebank datasets on a variety of modern and classical network architectures. Section 6 is titled 'Experiments' and Section 7 is titled 'Ablation Studies and Additional Experiments'.
Researcher Affiliation Collaboration Diego Granziol EMAIL Machine Learning Research Group University of Oxford, Oxford, UK; Nicholas P. Baskerville EMAIL School of Mathematics University of Bristol, Bristol, UK; Xingchen Wan EMAIL Machine Learning Research Group University of Oxford, Oxford, UK; Samuel Albanie EMAIL Department of Engineering University of Cambridge, Cambridge, UK; Stephen Roberts EMAIL Machine Learning Research Group University of Oxford, Oxford, UK. The email domain 'purestrength.ai' suggests an industry affiliation, while the other authors are affiliated with universities.
Pseudocode Yes Algorithm 1 Gadam/Gadam X
Open Source Code No The paper does not provide a specific repository link, an explicit statement of code release for the described methodology, or mention that the code is included in supplementary materials. Mentions of other GitHub repositories are for third-party tools or codebases used by the authors.
Open Datasets Yes We showcase the efficacy of our approach on the CIFAR-10/100, Image Net and Penn Treebank datasets on a variety of modern and classical network architectures. These are well-known, publicly available datasets, with citations such as (Krizhevsky et al., 2009) for CIFAR, (Russakovsky et al., 2015) for ImageNet, and (Marcus et al., 1993) for Penn Treebank.
Dataset Splits No The paper mentions using specific datasets and a batch size of 128, and 'standard data augmentation'. However, it does not explicitly state the percentages or absolute counts for training, validation, and test splits for the datasets used, nor does it cite specific predefined splits for each dataset.
Hardware Specification Yes We always use a single GPU for any single run of experiment. We use one of the three possible GPUs for our experiment: NVIDIA Ge Force GTX 1080 Ti, Ge Force RTX 2080 Ti or Tesla V100.
Software Dependencies Yes Unless otherwise stated, all experiments are run with Py Torch 1.1 on Python 3.7 Anaconda environment with GPU acceleration.
Experiment Setup Yes The paper provides detailed learning rate schedules for experiments with and without iterate averaging, hyperparameter tuning ranges for learning rates and weight decay for CIFAR and Image Net experiments, and specific values for momentum parameters (β = 0.9 for SGD, {β1, β2} = {0.9, 0.999} for Adam and variants), epsilon (ϵ = 10 8), and batch size (128).