reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Empirical Study of Bayesian Optimization: Acquisition Versus Partition

Authors: Erich Merrill, Alan Fern, Xiaoli Fern, Nima Dolatnia

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The primary contribution of this work is to conduct a comprehensive, repeatable evaluation within a common software framework, which we provide as an open-source package. Our results give strong evidence about the relative performance of these methods and reveal a consistent top performer, even when accounting for overall computation time.
Researcher Affiliation	Academia	Erich Merrill EMAIL Alan Fern EMAIL Xiaoli Fern EMAIL Nima Dolatnia EMAIL School of EECS, Oregon State University
Pseudocode	Yes	Pseudocode for the ABO algorithm is provided in Algorithm 1. In this paper, we consider two of the most popular acquisition functions, namely Expected Improvement (EI) (Mockus et al., 1978) and Gaussian Process Upper Conﬁdence Bound (GP-UCB) (Srinivas et al., 2010). Algorithm 1 Bayesian optimization Process Algorithm 2 Simultaneous Optimistic Optimization Algorithm 3 Simultaneous Optimistic Optimization (simpliﬁed) Algorithm 4 Locally Oriented Global Optimization Algorithm 5 Bayesian Multi-Scale Optimistic Optimization Algorithm 6 Inﬁnite-Metric GP Optimization
Open Source Code	Yes	To perform these experiments, we built a custom, extendable C++ framework that executes easily-repeatable evaluations of arbitrary black-box optimization algorithms. The source code is publicly available at https://github.com/Eiii/opt_cmp.
Open Datasets	Yes	We chose to exclusively use synthetic benchmark functions as objectives for this evaluation to ensure that each benchmark could sufﬁciently evaluate each objective with the time and resources available. The benchmark functions were selected with the goals of including functions with a wide range of quantitative and qualitative properties and including functions that the authors of the partitioning-based methods used to evaluate their algorithms. The LOGO paper evaluates its algorithm on the sin 2, branin, Rosenbrock, Hartmann3, Hartmann6, and Shekel (with m = 5, 7, 10) functions (Kawaguchi et al., 2016) with varying dimensionality and parameterizations when possible. IMGPO and Ba MSOO are also both evaluated on a subset of these objectives (Kawaguchi et al., 2015) (Wang et al., 2014), so we chose to include all of them in our experiments to allow for a more direct comparison between results. Additionally, we chose to include the Rastrigin, Schwefel, and Ackley test functions (Molga and Smutnicki, 2005) in our evaluation to provide more variety among the objectives.
Dataset Splits	No	The paper uses synthetic benchmark functions and evaluates algorithms on these functions by running them for a certain number of 'samples' or 'function evaluations' (e.g., 'to a horizon of at least 500 samples'). It does not describe traditional training/test/validation dataset splits, which are typically used in supervised learning contexts.
Hardware Specification	Yes	Each individual evaluation of one algorithm on one objective function was run on one core of a c4.4xlarge Amazon EC2 instance.
Software Dependencies	No	To perform these experiments, we built a custom, extendable C++ framework that executes easily-repeatable evaluations of arbitrary black-box optimization algorithms. The source code is publicly available at https://github.com/Eiii/opt_cmp. Our framework includes C++ implementations of all partitioning-based algorithms (SOO, LOGO, Ba MSOO, IMGPO), while a modified2 version of the bayesopt library (Martinez-Cantin, 2014) provides the implementation of BO that we use in our evaluation.
Experiment Setup	Yes	To evaluate the algorithms in a true black-box setting, we use one set of hyperparameters for each algorithm across every objective function. To avoid manually tuning the algorithms with the goal of maximizing their performance on our speciﬁc problem set, when possible we use the same hyperparameter settings the authors of each method used during their evaluation process. We use the implementation of BO from the bayesopt (Martinez-Cantin, 2014) library. For our evaluations, we use the squared exponential kernel with automatic relevance detection in which the parameters are estimated using maximum total likelihood. Each ABO run is started with three randomly chosen initial points (which count against the algorithm’s budget), and the GP’s parameters are re-estimated from the observed data every second ABO iteration. The EI criteria is parameter-free, but UCB requires us to deﬁne a value for β. This value is frequently tuned per-objective, but we instead set it to βt = 2 log \|D\|t2π2/6δ , which Srinivas et al. (2010) found to be effective. In this case, t is the number of function observations made so far and δ is a constant set to 0.5. SOO has no parameters to set. LOGO only requires a list of integers to use as the adaptive schedule W for its local bias parameter. In this work, we set W = 3, 4, 5, 6, 8, 30 to duplicate the value chosen by Kawaguchi et al. (2016) in their work that introduced the LOGO algorithm. Ba MSOO and IMGPO both require the use of a GP prior to estimate the upper bound on the objective function’s value at certain nodes locations. Using the GP’s estimated mean function μ and standard deviation function σ, we deﬁne the LCB and UCB as μ(x) BNσ(x). For Ba MSOO we deﬁne BN = p 2 log (π2N2/6η) as suggested by Wang et al. (2014), and for IMGPO we deﬁne BN = p 2 log (π2N2/12η) as suggested by Kawaguchi et al. (2015). In both cases, N is the total number of times the GP was used to evaluate a node, and the constant η is set to 0.05 to duplicate the value used in the experimental results included by Kawaguchi et al. (2015) and Wang et al. (2014).