reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

Authors: Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, Arvind Krishnamurthy

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantages and the generality of Auto LRS through extensive experiments of training DNNs for tasks from diverse domains using different optimizers.
Researcher Affiliation	Collaboration	Yuchen Jin, Tianyi Zhou, Liangyu Zhao University of Washington EMAIL Yibo Zhu, Chuanxiong Guo Byte Dance Inc. EMAIL Marco Canini KAUST EMAIL Arvind Krishnamurthy University of Washington EMAIL
Pseudocode	Yes	Algorithm 1: Auto LRS Input : (1) Number of steps in each training stage, τ (2) Learning-rate search interval (ηmin, ηmax) (3) Number of LRs to evaluate by BO in each training stage, k (4) Number of training steps to evaluate each LR in BO, τ (5) Trade-off weight in the acquisition function of BO, κ
Open Source Code	Yes	The Auto LRS implementation is available at https://github.com/Yuchen Jin/autolrs.
Open Datasets	Yes	Res Net-50 (He et al., 2016a) on Image Net classiﬁcation (Russakovsky et al., 2015); Transformer (Vaswani et al., 2017) and BERT (Devlin et al., 2019) for NLP tasks. We train Res Net-50 on Image Net (Russakovsky et al., 2015) using SGD with momentum on 32 NVIDIA Tesla V100 GPUs with data parallelism and a mini-batch size of 1024.
Dataset Splits	Yes	Auto LRS aims to ﬁnd an LR applied to every τ steps that minimizes the resulted validation loss.
Hardware Specification	Yes	We train Res Net-50 on Image Net (Russakovsky et al., 2015) using SGD with momentum on 32 NVIDIA Tesla V100 GPUs with data parallelism and a mini-batch size of 1024.
Software Dependencies	No	The paper mentions using 'Py Torch implementation' but does not specify a version number for PyTorch or any other software dependencies with specific versions.
Experiment Setup	Yes	In our default setting, we set k = 10 and τ = τ/10 so that the training steps spent on BO equals the training steps spent on updating the DNN model. We start from τ = 1000 and τ = 100 and double τ and τ after each stage until τ reaches τmax. We use τmax = 8000 for Res Net-50 and Transformer, τmax = 32000 for BERT.