Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

Authors: Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, Arvind Krishnamurthy

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the advantages and the generality of Auto LRS through extensive experiments of training DNNs for tasks from diverse domains using different optimizers.
Researcher Affiliation Collaboration Yuchen Jin, Tianyi Zhou, Liangyu Zhao University of Washington EMAIL Yibo Zhu, Chuanxiong Guo Byte Dance Inc. EMAIL Marco Canini KAUST EMAIL Arvind Krishnamurthy University of Washington EMAIL
Pseudocode Yes Algorithm 1: Auto LRS Input : (1) Number of steps in each training stage, τ (2) Learning-rate search interval (ηmin, ηmax) (3) Number of LRs to evaluate by BO in each training stage, k (4) Number of training steps to evaluate each LR in BO, τ (5) Trade-off weight in the acquisition function of BO, κ
Open Source Code Yes The Auto LRS implementation is available at https://github.com/Yuchen Jin/autolrs.
Open Datasets Yes Res Net-50 (He et al., 2016a) on Image Net classification (Russakovsky et al., 2015); Transformer (Vaswani et al., 2017) and BERT (Devlin et al., 2019) for NLP tasks. We train Res Net-50 on Image Net (Russakovsky et al., 2015) using SGD with momentum on 32 NVIDIA Tesla V100 GPUs with data parallelism and a mini-batch size of 1024.
Dataset Splits Yes Auto LRS aims to find an LR applied to every τ steps that minimizes the resulted validation loss.
Hardware Specification Yes We train Res Net-50 on Image Net (Russakovsky et al., 2015) using SGD with momentum on 32 NVIDIA Tesla V100 GPUs with data parallelism and a mini-batch size of 1024.
Software Dependencies No The paper mentions using 'Py Torch implementation' but does not specify a version number for PyTorch or any other software dependencies with specific versions.
Experiment Setup Yes In our default setting, we set k = 10 and τ = τ/10 so that the training steps spent on BO equals the training steps spent on updating the DNN model. We start from τ = 1000 and τ = 100 and double τ and τ after each stage until τ reaches τmax. We use τmax = 8000 for Res Net-50 and Transformer, τmax = 32000 for BERT.