Distributionally Ambiguous Optimization for Batch Bayesian Optimization

Authors: Nikitas Rontsis, Michael A. Osborne, Paul J. Goulart

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we demonstrate the effectiveness of our acquisition function against a number of state-of-the-art alternatives. The acquisition functions we consider are listed in Table 1. We do not compare against PPES as it is substantially more expensive and elaborate than our approach and there is no publicly available implementation of this method. We show that our acquisition function OEI achieves better performance than alternatives and highlight simple failure cases exhibited by competing methods. In making the following comparisons, extra care should be taken in the setup used. This is because Bayesian Optimization is a multifaceted procedure that depends on a collection of disparate elements (e.g. kernel/mean function choice, normalization of data, acquisition function, optimization of the acquisition function) each of which can have a considerable effect on the resulting performance (Snoek et al., 2012; Shahriari et al., 2016). For this reason we test
Researcher Affiliation Academia Nikitas Rontsis EMAIL Michael A. Osborne EMAIL Paul J. Goulart EMAIL Department of Engineering Science University of Oxford Oxford, OX1 3PN UK
Pseudocode No The paper describes algorithms and methods using mathematical notation and textual descriptions, but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code Yes We test the different algorithms on a unified testing framework, based on GPflow, available online at https://github.com/oxfordcontrol/Bayesian-Optimization.
Open Datasets Yes The functions considered are: the Six-Hump Camel function defined on [−2, 2] × [−1, 1], the Hartmann 6 dimensional function defined on [0, 1]^6 and the Eggholder function, defined on [−512, 512]^2. Finally we perform Bayesian Optimization to tune Proximal Policy Optimization (PPO), a state-of-the-art Deep Reinforcement Learning algorithm that has been shown to outperform several policy gradient reinforcement learning algorithms (Schulman et al., 2017). ... We tune the reinforcement algorithm on 4 Open AI Gym tasks (Hopper, Inverted Double Pendulum, Reacher and Inverted Pendulum Swingup) using the Roboschool robot simulator.
Dataset Splits No The initial dataset consists of 10 random points for all the functions. ... We run Bayesian Optimization with batch size of 20, with the same modeling, preprocessing and optimization choices as the ones used in the benchmark functions. The results of 20 runs are depicted in Figure 4. The paper specifies initial random points and training duration but does not provide specific train/test/validation splits or cross-validation details for reproducibility.
Hardware Specification Yes Table 2: Average execution time of the acquisition function, its gradient and Hessian when running BO in the Eggholder function on an Intel E5-2640v3 CPU.
Software Dependencies Yes We chose the KNITRO v10.3 (Byrd et al., 2006) Sequential Quadratic Optimization (SQP) non-linear solver with the default parameters for the optimization of OEI.
Experiment Setup Yes The initial dataset consists of 10 random points for all the functions. A Matern 3/2 kernel is used for the GP modeling (Rasmussen and Williams, 2005). As all of the considered functions are noiseless, we set the likelihood variance to a fixed small number 10−6 for numerical reasons. For the purpose of generality, the input domain of every function is scaled to [−0.5, 0.5]^n while the observation dataset yd is normalized at each iteration, such that Var[y D] = 1. The same transformations are applied to QEI, LP-EI and BLCB for reasons of consistency. All the acquisition functions except OEI are optimized with the quasi-Newton L-BFGS-B algorithm (Fletcher, 1987) with 20 random restarts. ... Finally we perform Bayesian Optimization to tune Proximal Policy Optimization (PPO)... We tune a set of 5 hyper-parameters which are listed in Table 3. We define as objective function the negative average reward per episode over the entire training period (4 × 10^5 timesteps).