A Stochastic Bundle Method for Interpolation
Authors: Alasdair Paren, Leonard Berrada, Rudra P. K. Poudel, M. Pawan Kumar
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust. |
| Researcher Affiliation | Collaboration | Alasdair Paren EMAIL Department of Engineering Science University of Oxford Oxford, UK Leonard Berrada EMAIL Department of Engineering Science University of Oxford Oxford, UK Rudra P. K. Poudel EMAIL Cambridge Research Laboratory, Toshiba Europe Ltd, Cambridge, UK. M. Pawan Kumar EMAIL Department of Engineering Science University of Oxford Oxford, UK. |
| Pseudocode | Yes | Algorithm 1 Dual Optimisation Algorithm Algorithm 2 The BORAT Algorithm |
| Open Source Code | Yes | The code to reproduce our results is publicly available . https://github.com/oval-group/borat |
| Open Datasets | Yes | Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust. ...training variants of residual networks on the SVHN and CIFAR data sets, and training a Bi LSTM on the Stanford Natural Language Inference data set. |
| Dataset Splits | Yes | The SVHN data set contains 73k training samples, 26k testing samples and 531k additional easier samples. From the 73k difficult training examples, we select 6k samples for validation; we use all remaining (both difficult and easy) examples for training, for a total of 598k samples. ...We use 45k samples for training and 5k for validation. Tiny Image Net contains 200 classes for training where each class has 500 images. The validation set contains 10,000 images. |
| Hardware Specification | Yes | All Experiments were performed on a single GPU (SVHN, SNLI, CIFAR) or on up to 4 GPUs (Image Net). Table 7: Average BORAT training epoch time for CIFAR100 data set, shown for varying N. Time quoted using a batch size of 128, CIFAR100, CE loss, a Wide Res Net 40-4, and a parallel implantation of BORAT. All Optimiser had access to 3 CPU cores, and one TITAN Xp GPU. Table 8: Average BORAT training epoch time for Image Net data set, shown for varying N. Time quoted using a batch size of 1024, Image Net, CE loss, a Res Net18, and a parallel implantation of BORAT. All Optimiser had access to 12 CPU cores, and 4 TITAN Xp GPUs. |
| Software Dependencies | No | For baselines we use the official implementation where available in PyTorch (Paszke et al., 2017). We use our implementation of L4, which we unit-test against the official Tensor Flow implementation (Abadi et al., 2015). |
| Experiment Setup | Yes | For SGD, we use the manual schedule for the learning rate of Zagoruyko and Komodakis (2016). For L4Adam and L4Mom, we cross-validate the main learning-rate hyperparameter α to be in {0.0015, 0.015, 0.15} (0.15 is the value recommended by (Rolinek and Martius, 2018)). For other methods, the learning rate hyperparameter is tuned as a power of 10. The ℓ2 regularization is cross-validated in {0.0001, 0.0005} for all methods but BORAT. For BORAT, the regularization is expressed as a constraint on the ℓ2-norm of the parameters, and its maximal value is set to 100. SGD, BORAT and BPGrad use a Nesterov momentum of 0.9. All methods use a dropout rate of 0.4 and a fixed budget of 160 epochs, following (Zagoruyko and Komodakis, 2016). A batch size of 128 is used for all experiments. |