Unleashing the Potential of Acquisition Functions in High-Dimensional Bayesian Optimization
Authors: Jiayu Zhao, Renyu Yang, SHENGHAO QIU, Zheng Wang
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper investigates a largely understudied problem concerning the impact of AF maximizer initialization on exploiting AFs capability. Our large-scale empirical study shows that the widely used random initialization strategy often fails to harness the potential of an AF. In light of this, we propose a better initialization approach by employing multiple heuristic optimizers to leverage the historical data of black-box optimization to generate initial points for the AF maximizer. We evaluate our approach with a range of heavily studied synthetic functions and real-world applications. Experimental results show that our techniques, while simple, can significantly enhance the standard BO and outperform state-of-the-art methods by a large margin in most test cases. |
| Researcher Affiliation | Academia | Jiayu Zhao EMAIL School of Computing University of Leeds Renyu Yang EMAIL School of Computing University of Leeds Shenghao Qiu EMAIL School of Computing University of Leeds Zheng Wang EMAIL School of Computing University of Leeds |
| Pseudocode | Yes | Algorithm 1 Acquisition function maximizer initialization for high-dimensional Bayesian optimization (AIBO) |
| Open Source Code | Yes | Data availability The data and code associated with this paper are openly available at https://github. com/gloaming2dawn/AIBO. |
| Open Datasets | Yes | Table 1: Benchmarks used in evaluation. Function/Task #Dimensions Search Range Ackley 20, 100, 300 [-5, 10] Rosenbrock 20, 100, 300 [-5, 10] Rastrigin 20, 100, 300 [-5.12, 5.12] Griewank 20, 100, 300 [-10, 10] Synthetic Levy 20, 100, 300 [-600, 600] Robot pushing 14 / Rover trajectory planning 60 [0, 1] Robotics Half-Cheetah locomotion 102 [-1, 1] Robot pushing The task is used in Tur BO Eriksson et al. (2019) and Wang et al. (2018) to validate high-dimensional BOs. Rover trajectory planning The task, also considered in Eriksson et al. (2019); Wang et al. (2018), is to maximize the trajectory of a rover over rough terrain. Half-cheetah robot locomotion We consider the 102D half-cheetah robot locomotion task simulated in Mu Jo Co (Todorov et al., 2012) and use the linear policy a = Ws introduced in (Mania et al., 2018) to control the robot walking. Lasso-DNA (Šehić et al., 2021) and Nasbench (Ying et al., 2019) |
| Dataset Splits | No | The paper mentions using N = 50 samples to obtain an initial dataset D0, but does not specify further dataset splits for training, validation, or testing in a conventional supervised learning sense. The tasks involve optimizing black-box functions, not typical classification/regression with predefined splits. |
| Hardware Specification | Yes | The experiments are run on an NVIDIA RTX 3090 GPU equipped with a 20-core Intel Xeon Gold 5218R CPU Processor. |
| Software Dependencies | Yes | We use the implementations in pycma (Hansen et al., 2022) and pymoo (Blank & Deb, 2020) for the CMA-ES and the GA initialization strategies, respectively. |
| Experiment Setup | Yes | We select the Matérn-5/2 kernel with ARD (each input dimension has a separate length scale) and a constant mean function to parameterize our GP model. The model parameters are fitted by optimizing the logmarginal likelihood before proposing a new batch of samples for evaluation. Following the usual GP fitting procedure, we re-scale the input domain to [0, 1]d. We also use power transforms to the function values to make data more Gaussian-like. This transformation is useful for highly skewed functions like Rosenbrock and has been proven effective in real-world applications (Cowen-Rivers et al., 2020). We use the following bounds for the model parameters: length-scale λi [0.005, 20.0], noise variance σ2 [1e-6, 0.01]. We use N = 50 samples to obtain all benchmarks initial dataset D0. We set k = 500 and n = 1 for each AF maximizer initialisation strategy. We use the implementations in pycma (Hansen et al., 2022) and pymoo (Blank & Deb, 2020) for the CMA-ES and the GA initialization strategies, respectively. For CMA-ES, we set the initial standard deviation to 0.2. For GA initialization, we set the population size to 50. The default AF maximizer in AIBO is the gradient-based optimization implemented in Bo Torch. The default AF is UCB with βt = 1.96 (default setting in the skopt library (Head et al., 2021)), and the default batch size is set to 10. |