reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Stochastic Bundle Method for Interpolation

Authors: Alasdair Paren, Leonard Berrada, Rudra P. K. Poudel, M. Pawan Kumar

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust.
Researcher Affiliation	Collaboration	Alasdair Paren EMAIL Department of Engineering Science University of Oxford Oxford, UK Leonard Berrada EMAIL Department of Engineering Science University of Oxford Oxford, UK Rudra P. K. Poudel EMAIL Cambridge Research Laboratory, Toshiba Europe Ltd, Cambridge, UK. M. Pawan Kumar EMAIL Department of Engineering Science University of Oxford Oxford, UK.
Pseudocode	Yes	Algorithm 1 Dual Optimisation Algorithm Algorithm 2 The BORAT Algorithm
Open Source Code	Yes	The code to reproduce our results is publicly available . https://github.com/oval-group/borat
Open Datasets	Yes	Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust. ...training variants of residual networks on the SVHN and CIFAR data sets, and training a Bi LSTM on the Stanford Natural Language Inference data set.
Dataset Splits	Yes	The SVHN data set contains 73k training samples, 26k testing samples and 531k additional easier samples. From the 73k diﬃcult training examples, we select 6k samples for validation; we use all remaining (both diﬃcult and easy) examples for training, for a total of 598k samples. ...We use 45k samples for training and 5k for validation. Tiny Image Net contains 200 classes for training where each class has 500 images. The validation set contains 10,000 images.
Hardware Specification	Yes	All Experiments were performed on a single GPU (SVHN, SNLI, CIFAR) or on up to 4 GPUs (Image Net). Table 7: Average BORAT training epoch time for CIFAR100 data set, shown for varying N. Time quoted using a batch size of 128, CIFAR100, CE loss, a Wide Res Net 40-4, and a parallel implantation of BORAT. All Optimiser had access to 3 CPU cores, and one TITAN Xp GPU. Table 8: Average BORAT training epoch time for Image Net data set, shown for varying N. Time quoted using a batch size of 1024, Image Net, CE loss, a Res Net18, and a parallel implantation of BORAT. All Optimiser had access to 12 CPU cores, and 4 TITAN Xp GPUs.
Software Dependencies	No	For baselines we use the oﬃcial implementation where available in PyTorch (Paszke et al., 2017). We use our implementation of L4, which we unit-test against the oﬃcial Tensor Flow implementation (Abadi et al., 2015).
Experiment Setup	Yes	For SGD, we use the manual schedule for the learning rate of Zagoruyko and Komodakis (2016). For L4Adam and L4Mom, we cross-validate the main learning-rate hyperparameter α to be in {0.0015, 0.015, 0.15} (0.15 is the value recommended by (Rolinek and Martius, 2018)). For other methods, the learning rate hyperparameter is tuned as a power of 10. The ℓ2 regularization is cross-validated in {0.0001, 0.0005} for all methods but BORAT. For BORAT, the regularization is expressed as a constraint on the ℓ2-norm of the parameters, and its maximal value is set to 100. SGD, BORAT and BPGrad use a Nesterov momentum of 0.9. All methods use a dropout rate of 0.4 and a ﬁxed budget of 160 epochs, following (Zagoruyko and Komodakis, 2016). A batch size of 128 is used for all experiments.