Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly

Authors: Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie Neiswanger, Biswajit Paria, Christopher R. Collins, Jeff Schneider, Barnabas Poczos, Eric P. Xing

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments We now compare Dragonfly to the following algorithms and packages. RAND: uniform random search; EA: evolutionary algorithm; PDOO: parallel deterministic optimistic optimisation (Grill et al., 2015); Hyper Opt (v0.1.1) (Bergstra et al., 2013); SMAC (v0.9.0) (Hutter et al., 2011); Spearmint (Snoek et al., 2012); GPy Opt (v1.2.5) (Authors, 2016). Of these PDOO is a deterministic non-Bayesian algorithm for Euclidean domains. SMAC, Spearmint, and GPy Opt are model based BO procedures, where SMAC uses random forests, while Spearmint and GPy Opt use GPs. For EA, we use the same procedure used to optimise the acquisition in Section 5. We begin with experiments on some standard synthetic benchmarks for zeroth order optimisation.
Researcher Affiliation Academia Kirthevasan Kandasamy EMAIL Karun Raju Vysyaraju EMAIL Willie Neiswanger EMAIL Biswajit Paria EMAIL Christopher R. Collins EMAIL Jeff Schneider EMAIL Barnab as P oczos EMAIL Eric P. Xing EMAIL Carnegie Mellon University, Pittsburgh, PA 15213, USA
Pseudocode Yes Algorithm 1 Bayesian Optimisation in Dragonfly with M asynchronous workers
Open Source Code Yes In this work, we present Dragonfly, an open source Python library for scalable and robust BO. Dragonfly incorporates multiple recently developed methods that allow BO to be applied in challenging real world settings; these include better methods for handling higher dimensional domains, methods for handling multi-fidelity evaluations when cheap approximations of an expensive function are available, methods for optimising over structured combinatorial spaces, such as the space of neural network architectures, and methods for handling parallel evaluations. Additionally, we develop new methodological improvements in BO for selecting the Bayesian model, selecting the acquisition function, and optimising over complex domains with different variable types and additional constraints. We compare Dragonfly to a suite of other packages and algorithms for global optimisation and demonstrate that when the above methods are integrated, they enable significant improvements in the performance of BO. The Dragonfly library is available at dragonfly.github.io.
Open Datasets Yes Luminous Red Galaxies: Here we used data on Luminous Red Galaxies (LRGs) for maximum likelihood inference on 9 Euclidean cosmological parameters. The likelihood is computed via the galaxy power spectrum. Software and data were taken from Kandasamy et al. (2015b); Tegmark et al. (2006). Type Ia Supernova: We use data on Type Ia supernova for maximum likelihood inference on 3 cosmological parameters... We use data from Davis et al. (2007), and the likelihood is computed using the method described in Shchigolev (2017). Random forest regression, News popularity: In this experiment, we tune random forest regression (RFR) on the news popularity dataset (Fernandes et al., 2015). Gradient Boosted Regression, Naval Propulsion: In this experiment, we tune gradient boosted regression (GBR) on the naval propulsion dataset (Coraddu et al., 2016). SALSA, Energy Appliances: We use the SALSA regression method (Kandasamy and Yu, 2016) on the energy appliances dataset (Candanedo et al., 2017) to tune 30 integral, discrete, and Euclidean parameters of the model. Neural Architecture Search: ...on the blog feedback (Buza, 2014), indoor location (Torres-Sospedra et al., 2014), and slice localisation (Graf et al., 2011), datasets in Figure 11.
Dataset Splits No The training set had 20000 points, but could be approximated via a subset of size z (5000, 20000) by a multi-fidelity method. The training set had 9000 points, but could be approximated via a subset of size z (2000, 9000) by a multi-fidelity method. The training set had 8000 points, but could be approximated via a subset of size z (2000, 8000) by a multi-fidelity method. The paper mentions 'validation error' and 'training set' sizes but does not specify how these datasets were split into training, validation, or test sets for reproduction.
Hardware Specification Yes Each method was given a budget of 4 hours on a 3.3 GHz Intel Xeon processor with 512GB memory. Each method was given a budget of 6 hours on a 3.3 GHz Intel Xeon processor with 512GB memory. Each method was given a budget of 3 hours on a 2.6 GHz Intel Xeon processor with 384GB memory. Each method was given a budget of 8 hours on a 2.6 GHz Intel Xeon processor with 384GB memory. We test both methods in an asynchronously parallel set up of two Ge Force GTX 970 (4GB) GPU workers with a computational budget of 8 hours.
Software Dependencies Yes Hyper Opt (v0.1.1) (Bergstra et al., 2013); SMAC (v0.9.0) (Hutter et al., 2011); Spearmint (Snoek et al., 2012); GPy Opt (v1.2.5) (Authors, 2016).
Experiment Setup Yes Each function evaluation, trains an architecture with stochastic gradient descent (SGD) with a fixed batch size of 256. We used the number of batch iterations in a one dimensional fidelity space, i.e. Z = [4000, 20000] for Dragonfly while NASBOT always queried with z = 20, 000 iterations. Additionally, we also impose the following constraints on the space of architectures: maximum number of layers: 60, maximum mass: 108, maximum in/out degree: 5, maximum number of edges: 200, maximum number of units per layer: 1024, minimum number of units per layer: 8.