Are We Forgetting about Compositional Optimisers in Bayesian Optimisation?

Authors: Antoine Grosnit, Alexander I. Cowen-Rivers, Rasul Tutunov, Ryan-Rhys Griffiths, Jun Wang, Haitham Bou-Ammar

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark.
Researcher Affiliation Collaboration Antoine Grosnit EMAIL Alexander I. Cowen-Rivers EMAIL Rasul Tutunov EMAIL Huawei R&D UK Ryan-Rhys Griffiths EMAIL Huawei R&D UK University of Cambridge Jun Wang EMAIL Haitham Bou-Ammar EMAIL Huawei R&D UK University College London
Pseudocode Yes Following the introduction of GP surrogate models and acquisition functions, we are now ready to present a canonical template for the Bayesian optimisation algorithm. The main steps are summarised in the pseudocode of Algorithm 1. Algorithm 1 Batched Bayesian Optimisation with GPs
Open Source Code Yes An open-source implementation is made available at https://github.com/huawei-noah/noah-research/tree/Comp BO/BO/HEBO/Comp BO.
Open Datasets Yes The Bayesmark tasks consist of both regression and classification tasks on the Boston and Diabetes UCI data sets (Dua and Graff, 2017) respectively.
Dataset Splits No The paper states: "At the beginning of each experiment, three points are drawn uniformly at random within the search domain to initialise the surrogate model." and for Bayesmark tasks: "We sample 2 D points uniformly at random to initialise the model." While it mentions initialization and batch sizes, it does not explicitly provide percentages, sample counts, or specific methodology for the training, validation, or test splits of the datasets themselves (e.g., Boston or Diabetes UCI datasets).
Hardware Specification No The paper states: "Execution time of UCB maximisation run on 4 CPUs." This specifies the number of CPUs but does not provide specific models, types, or other detailed specifications of the CPUs or any other hardware components used.
Software Dependencies No The paper mentions several software libraries and frameworks: "GPy Torch (Gardner et al., 2018)", "scipy (Virtanen et al., 2020)", "Py Torch (Paszke et al., 2019)", "pymoo library (Blank and Deb, 2020)", and "Bo Torch library (Balandat et al., 2020)". However, it does not provide specific version numbers for any of these software components.
Experiment Setup Yes At each acquisition step k, the hyperparameters of the GP kernel are estimated based on the current observed input-output pairs Dk by optimising the negative log marginal likelihood with a Gamma prior over θ. To ensure fairness in performance comparison, the same number of optimisation steps T (set to 64) and minibatch size m (set to 128), is used for each method at each acquisition step. As acquisition function maximisation is a non-convex problem, it is sensitive to the initialisation set. As such, we use multiple restart points (Wang et al., 2020) that we first obtain by drawing 1024 batches uniformly at random in the modified search space [0, 1]q d, and second using the default heuristic from (Balandat et al., 2020) to select only 32 promising initialisation batches. Consequently, at each inner optimisation step of BO, the Random Search optimisation strategy is granted 32 T m evaluations of the acquisition function at random batches. Similarly, CMA-ES and DE are run for 64 evolution and mutation steps, and the aforementioned initialisation strategy is used to generate the 32 members of the initial population.