reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Authors: Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, Ameet Talwalkar

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems. In this section, we evaluate the empirical behavior of Hyperband with three different resource types: iterations, data set subsamples, and feature samples. For all experiments, we compare Hyperband with three well known Bayesian optimization algorithms SMAC, TPE, and Spearmint using their default settings.
Researcher Affiliation	Collaboration	Lisha Li EMAIL Carnegie Mellon University, Pittsburgh, PA 15213 Kevin Jamieson EMAIL University of Washington, Seattle, WA 98195 Giulia De Salvo EMAIL Google Research, New York, NY 10011 Afshin Rostamizadeh EMAIL Google Research, New York, NY 10011 Ameet Talwalkar EMAIL Carnegie Mellon University, Pittsburgh, PA 15213 Determined AI
Pseudocode	Yes	Algorithm 1: Hyperband algorithm for hyperparameter optimization. Figure 9: (Bottom) The Hyperband algorithm for the inﬁnite horizon setting. Hyperband calls Successive Halving as a subroutine. Figure 10: The ﬁnite horizon Successive Halving and Hyperband algorithms are inspired by their inﬁnite horizon counterparts of Figure 9 to handle practical constraints. Hyperband calls Successive Halving as a subroutine.
Open Source Code	No	Code and description of algorithm used is available at http://deeplearning.net/tutorial/lenet.html. This URL refers to the LeNet model used for an example application, not the Hyperband algorithm itself. The paper does not provide a link or explicit statement about the open-source availability of their Hyperband implementation.
Open Datasets	Yes	We work with the MNIST data set and optimize hyperparameters for the Le Net convolutional neural network... Data sets: We considered three image classiﬁcation data sets: CIFAR-10 (Krizhevsky, 2009), rotated MNIST with background images (MRBI) (Larochelle et al., 2007), and Street View House Numbers (SVHN) (Netzer et al., 2011). We used the framework introduced by Feurer et al. (2015), which explored a structured hyperparameter search space comprised of 15 classiﬁers, 14 feature preprocessing methods, and 4 data preprocessing methods for a total of 110 hyperparameters.
Dataset Splits	Yes	Each data set was split into a training, validation, and test set: (1) CIFAR-10 has 40k, 10k, and 10k instances; (2) MRBI has 10k, 2k, and 50k instances; and (3) SVHN has close to 600k, 6k, and 26k instances for training, validation, and test respectively. Feurer et al. (2015) split each data set into 2/3 training and 1/3 test set, whereas we introduce a validation set to avoid overﬁtting to the test data. We also used 2/3 of the data for training, but split the rest of the data into two equally sized validation and test sets.
Hardware Specification	Yes	The experiments took the equivalent of over 1 year of GPU hours on NVIDIA GRID K520 cards available on Amazon EC2 g2.8xlarge instances. All experiments were performed on Google Cloud Compute n1-standard-1 instances in us-central1-f region with 1 CPU and 3.75GB of memory. Each hyperparameter optimization algorithm was run for ten trials on Amazon EC2 m4.2xlarge instances; We ran 10 trials of each searcher, with each trial lasting 12 hours on a n1-standard-16 machine from Google Cloud Compute.
Software Dependencies	No	The exact architecture used is the 18% model provided on cuda-convnet for CIFAR-10. The width of the response normalization layer was excluded due to limitations of the Caﬀe framework. The default SVM method in Scikit-learn is single core and takes hours to train on CIFAR-10. The paper mentions various software components and frameworks (cuda-convnet, Caffe, Scikit-learn) but does not provide specific version numbers for any of them.
Experiment Setup	Yes	Our search space includes learning rate, batch size, and number of kernels for the two layers of the network as hyperparameters (details are shown in Table 2 in Appendix A). We deﬁne the resource allocated to each conﬁguration to be number of iterations of SGD, with one unit of resource corresponding to one epoch, i.e., a full pass over the data set. We set R to 81 and use the default value of η = 3, resulting in smax = 4 and thus 5 brackets of Successive Halving with diﬀerent tradeoﬀs between n and B/n. For CIFAR-10 and MRBI, R was set to 300 (or 30k total iterations). For SVHN, R was set to 600 (or 60k total iterations) to accommodate the larger training set. Given R for these experiments, we set η = 4 to yield ﬁve Successive Halving brackets for Hyperband.