reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly

Authors: Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie Neiswanger, Biswajit Paria, Christopher R. Collins, Jeff Schneider, Barnabas Poczos, Eric P. Xing

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments We now compare Dragonﬂy to the following algorithms and packages. RAND: uniform random search; EA: evolutionary algorithm; PDOO: parallel deterministic optimistic optimisation (Grill et al., 2015); Hyper Opt (v0.1.1) (Bergstra et al., 2013); SMAC (v0.9.0) (Hutter et al., 2011); Spearmint (Snoek et al., 2012); GPy Opt (v1.2.5) (Authors, 2016). Of these PDOO is a deterministic non-Bayesian algorithm for Euclidean domains. SMAC, Spearmint, and GPy Opt are model based BO procedures, where SMAC uses random forests, while Spearmint and GPy Opt use GPs. For EA, we use the same procedure used to optimise the acquisition in Section 5. We begin with experiments on some standard synthetic benchmarks for zeroth order optimisation.
Researcher Affiliation	Academia	Kirthevasan Kandasamy EMAIL Karun Raju Vysyaraju EMAIL Willie Neiswanger EMAIL Biswajit Paria EMAIL Christopher R. Collins EMAIL Jeff Schneider EMAIL Barnab as P oczos EMAIL Eric P. Xing EMAIL Carnegie Mellon University, Pittsburgh, PA 15213, USA
Pseudocode	Yes	Algorithm 1 Bayesian Optimisation in Dragonﬂy with M asynchronous workers
Open Source Code	Yes	In this work, we present Dragonﬂy, an open source Python library for scalable and robust BO. Dragonﬂy incorporates multiple recently developed methods that allow BO to be applied in challenging real world settings; these include better methods for handling higher dimensional domains, methods for handling multi-ﬁdelity evaluations when cheap approximations of an expensive function are available, methods for optimising over structured combinatorial spaces, such as the space of neural network architectures, and methods for handling parallel evaluations. Additionally, we develop new methodological improvements in BO for selecting the Bayesian model, selecting the acquisition function, and optimising over complex domains with different variable types and additional constraints. We compare Dragonﬂy to a suite of other packages and algorithms for global optimisation and demonstrate that when the above methods are integrated, they enable signiﬁcant improvements in the performance of BO. The Dragonﬂy library is available at dragonfly.github.io.
Open Datasets	Yes	Luminous Red Galaxies: Here we used data on Luminous Red Galaxies (LRGs) for maximum likelihood inference on 9 Euclidean cosmological parameters. The likelihood is computed via the galaxy power spectrum. Software and data were taken from Kandasamy et al. (2015b); Tegmark et al. (2006). Type Ia Supernova: We use data on Type Ia supernova for maximum likelihood inference on 3 cosmological parameters... We use data from Davis et al. (2007), and the likelihood is computed using the method described in Shchigolev (2017). Random forest regression, News popularity: In this experiment, we tune random forest regression (RFR) on the news popularity dataset (Fernandes et al., 2015). Gradient Boosted Regression, Naval Propulsion: In this experiment, we tune gradient boosted regression (GBR) on the naval propulsion dataset (Coraddu et al., 2016). SALSA, Energy Appliances: We use the SALSA regression method (Kandasamy and Yu, 2016) on the energy appliances dataset (Candanedo et al., 2017) to tune 30 integral, discrete, and Euclidean parameters of the model. Neural Architecture Search: ...on the blog feedback (Buza, 2014), indoor location (Torres-Sospedra et al., 2014), and slice localisation (Graf et al., 2011), datasets in Figure 11.
Dataset Splits	No	The training set had 20000 points, but could be approximated via a subset of size z (5000, 20000) by a multi-ﬁdelity method. The training set had 9000 points, but could be approximated via a subset of size z (2000, 9000) by a multi-ﬁdelity method. The training set had 8000 points, but could be approximated via a subset of size z (2000, 8000) by a multi-ﬁdelity method. The paper mentions 'validation error' and 'training set' sizes but does not specify how these datasets were split into training, validation, or test sets for reproduction.
Hardware Specification	Yes	Each method was given a budget of 4 hours on a 3.3 GHz Intel Xeon processor with 512GB memory. Each method was given a budget of 6 hours on a 3.3 GHz Intel Xeon processor with 512GB memory. Each method was given a budget of 3 hours on a 2.6 GHz Intel Xeon processor with 384GB memory. Each method was given a budget of 8 hours on a 2.6 GHz Intel Xeon processor with 384GB memory. We test both methods in an asynchronously parallel set up of two Ge Force GTX 970 (4GB) GPU workers with a computational budget of 8 hours.
Software Dependencies	Yes	Hyper Opt (v0.1.1) (Bergstra et al., 2013); SMAC (v0.9.0) (Hutter et al., 2011); Spearmint (Snoek et al., 2012); GPy Opt (v1.2.5) (Authors, 2016).
Experiment Setup	Yes	Each function evaluation, trains an architecture with stochastic gradient descent (SGD) with a ﬁxed batch size of 256. We used the number of batch iterations in a one dimensional ﬁdelity space, i.e. Z = [4000, 20000] for Dragonﬂy while NASBOT always queried with z = 20, 000 iterations. Additionally, we also impose the following constraints on the space of architectures: maximum number of layers: 60, maximum mass: 108, maximum in/out degree: 5, maximum number of edges: 200, maximum number of units per layer: 1024, minimum number of units per layer: 8.