Iterate Averaging in the Quest for Best Test Error
Authors: Diego Granziol, Nicholas P. Baskerville, Xingchen Wan, Samuel Albanie, Stephen Roberts
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We derive three phenomena from our theoretical results... Inspired by these results, together with empirical investigations... We showcase the efficacy of our approach on the CIFAR-10/100, Image Net and Penn Treebank datasets on a variety of modern and classical network architectures. Section 6 is titled 'Experiments' and Section 7 is titled 'Ablation Studies and Additional Experiments'. |
| Researcher Affiliation | Collaboration | Diego Granziol EMAIL Machine Learning Research Group University of Oxford, Oxford, UK; Nicholas P. Baskerville EMAIL School of Mathematics University of Bristol, Bristol, UK; Xingchen Wan EMAIL Machine Learning Research Group University of Oxford, Oxford, UK; Samuel Albanie EMAIL Department of Engineering University of Cambridge, Cambridge, UK; Stephen Roberts EMAIL Machine Learning Research Group University of Oxford, Oxford, UK. The email domain 'purestrength.ai' suggests an industry affiliation, while the other authors are affiliated with universities. |
| Pseudocode | Yes | Algorithm 1 Gadam/Gadam X |
| Open Source Code | No | The paper does not provide a specific repository link, an explicit statement of code release for the described methodology, or mention that the code is included in supplementary materials. Mentions of other GitHub repositories are for third-party tools or codebases used by the authors. |
| Open Datasets | Yes | We showcase the efficacy of our approach on the CIFAR-10/100, Image Net and Penn Treebank datasets on a variety of modern and classical network architectures. These are well-known, publicly available datasets, with citations such as (Krizhevsky et al., 2009) for CIFAR, (Russakovsky et al., 2015) for ImageNet, and (Marcus et al., 1993) for Penn Treebank. |
| Dataset Splits | No | The paper mentions using specific datasets and a batch size of 128, and 'standard data augmentation'. However, it does not explicitly state the percentages or absolute counts for training, validation, and test splits for the datasets used, nor does it cite specific predefined splits for each dataset. |
| Hardware Specification | Yes | We always use a single GPU for any single run of experiment. We use one of the three possible GPUs for our experiment: NVIDIA Ge Force GTX 1080 Ti, Ge Force RTX 2080 Ti or Tesla V100. |
| Software Dependencies | Yes | Unless otherwise stated, all experiments are run with Py Torch 1.1 on Python 3.7 Anaconda environment with GPU acceleration. |
| Experiment Setup | Yes | The paper provides detailed learning rate schedules for experiments with and without iterate averaging, hyperparameter tuning ranges for learning rates and weight decay for CIFAR and Image Net experiments, and specific values for momentum parameters (β = 0.9 for SGD, {β1, β2} = {0.9, 0.999} for Adam and variants), epsilon (ϵ = 10 8), and batch size (128). |