Respecting the limit: Bayesian optimization with a bound on the optimal value

Authors: Hanyang Wang, Juergen Branke, Matthias Poloczek

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on a variety of benchmarks demonstrate the benefit of taking prior information about the optimal value into account, and that the proposed approach significantly outperforms existing techniques. Furthermore, we notice that even in the absence of prior information on the bound, the proposed Slog GP surrogate model still performs better than the standard GP model in most cases, which we explain by its larger expressiveness.
Researcher Affiliation Collaboration Hanyang Wang EMAIL Warwick Mathematics Institute University of Warwick Coventry, CV4 7AL, United Kingdom Juergen Branke EMAIL Warwick Business School University of Warwick Coventry, CV4 7AL, United Kingdom Matthias Poloczek EMAIL Amazon San Francisco, CA 94105, USA
Pseudocode Yes Algorithm 1 BABO (Slog GPb + Slog TEI)
Open Source Code Yes The code is available at https://github.com/Hanyang Henry-Wang/BABO.git.
Open Datasets Yes We evaluate the algorithms on eight synthetic functions that are widely used in BO testing as well as three real-world applications. [...] Table 3: Test Function Information Test Function Optimal Value Search Space GP-generated functions (2D) [0., 1.]2 Slog GP-generated functions (2D) [0., 1.]2 Beale (2D) 0 [ 4.5, 4.5]2 Branin (2D) 0.397887 [[ 5., 10.], [0., 15.]] Six Hump Caml (2D) -1.0316 [[ 3., 3.], [ 2., 2.]] Levy (2D) 0 [[ 10., 10.], [ 10., 10.]] Hartmann (3D) -3.86278 [0., 1.]d Dixon Price (4D) 0 [ 10., 10.]d Rosenbrock (4D) 0 [ 2.048, 2.048]d Ackley (6D) 0 [ 32.768, 32.768]d Powell (8D) 0 [ 4., 5.]d Styblinski Tang (10D) -391.6599 [ 5., 5.]d PDE Variance (4D) [[0.1., 5.], [0.1, 5.], [0.01, 5.], [0.01, 5.]] Robot Push (4D) 0 [[ 5., 5.], [ 5., 5.], [0., 2π], [0., 300.]] XGBoost Hyperparameter Tuning (6D) [[0., 10.], [0., 10.], [5., 15.], [1., 20.], [0.5, 1.], [0.1, 1.]]
Dataset Splits No For each d-dimensional test function, we sample 4d initial points from a Latin hypercube design. The input domain is scaled to [0, 1]d. For GP-based methods, function values are standardized (scaled and centralized) and for Slog GP-based methods, function values are scaled and the centralizing is done in model training. [...] We test the gap between model prediction mean at a random x and its value f(x). [...] We chose four: Skin Segmentation, Bank Note Authentication, Wine Quality, and Breast Cancer. The six hyperparameters are min child weight, colsample bytree, max depth, subsample, alpha, and gamma. Notably, max depth is integer-valued; however, we conduct the search in a continuous space and round it to the nearest integer for evaluation. The objective value is determined by the model s classification error rate on a hold-out dataset.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or other computing resource specifications used for running the experiments.
Software Dependencies No The paper mentions 'restart L-BFGS-B in scipy' but does not specify version numbers for scipy or any other software dependencies required to replicate the experiments.
Experiment Setup Yes The experimental setting is as follows. For each d-dimensional test function, we sample 4d initial points from a Latin hypercube design. The input domain is scaled to [0, 1]d. For GP-based methods, function values are standardized (scaled and centralized) and for Slog GP-based methods, function values are scaled and the centralizing is done in model training. As kernel we use the squared exponential kernel: K (xa, xb) = σ2 exp xa xb 2 / (2ℓ2) . For acquisition function optimization, we use restart L-BFGS-B in scipy. The restart time and initial samples (restart number 3d and initial sample 30d) and L-BFGS-B options are the same for all acquisition functions. [...] The hyperparameter β of LCB is set to be sqrt(2d log (t2π^2)). [...] Additionally, as discussed in Section 3, the hyperparameters of BABO are set to be (δ1, δ2, δ3) = (0.1, 0.01, 0.252). [...] In practice, a small positive noise variance is typically introduced to ensure numerical stability. Since the estimated signal variance ˆσ2 g in Slog GP can vary substantially, ranging from near-zero to several hundred, we adopt an adaptive noise variance proportional to the estimated signal variance: σ2 noise = 10 5 ˆσ2 g,n 1 The initial noise variance is set to 6 10 6. We maintain these noise parameter settings across all comparative methods to ensure fair comparison.