reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Respecting the limit: Bayesian optimization with a bound on the optimal value

Authors: Hanyang Wang, Juergen Branke, Matthias Poloczek

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on a variety of benchmarks demonstrate the benefit of taking prior information about the optimal value into account, and that the proposed approach significantly outperforms existing techniques. Furthermore, we notice that even in the absence of prior information on the bound, the proposed Slog GP surrogate model still performs better than the standard GP model in most cases, which we explain by its larger expressiveness.
Researcher Affiliation	Collaboration	Hanyang Wang EMAIL Warwick Mathematics Institute University of Warwick Coventry, CV4 7AL, United Kingdom Juergen Branke EMAIL Warwick Business School University of Warwick Coventry, CV4 7AL, United Kingdom Matthias Poloczek EMAIL Amazon San Francisco, CA 94105, USA
Pseudocode	Yes	Algorithm 1 BABO (Slog GPb + Slog TEI)
Open Source Code	Yes	The code is available at https://github.com/Hanyang Henry-Wang/BABO.git.
Open Datasets	Yes	We evaluate the algorithms on eight synthetic functions that are widely used in BO testing as well as three real-world applications. [...] Table 3: Test Function Information Test Function Optimal Value Search Space GP-generated functions (2D) [0., 1.]2 Slog GP-generated functions (2D) [0., 1.]2 Beale (2D) 0 [ 4.5, 4.5]2 Branin (2D) 0.397887 [[ 5., 10.], [0., 15.]] Six Hump Caml (2D) -1.0316 [[ 3., 3.], [ 2., 2.]] Levy (2D) 0 [[ 10., 10.], [ 10., 10.]] Hartmann (3D) -3.86278 [0., 1.]d Dixon Price (4D) 0 [ 10., 10.]d Rosenbrock (4D) 0 [ 2.048, 2.048]d Ackley (6D) 0 [ 32.768, 32.768]d Powell (8D) 0 [ 4., 5.]d Styblinski Tang (10D) -391.6599 [ 5., 5.]d PDE Variance (4D) [[0.1., 5.], [0.1, 5.], [0.01, 5.], [0.01, 5.]] Robot Push (4D) 0 [[ 5., 5.], [ 5., 5.], [0., 2π], [0., 300.]] XGBoost Hyperparameter Tuning (6D) [[0., 10.], [0., 10.], [5., 15.], [1., 20.], [0.5, 1.], [0.1, 1.]]
Dataset Splits	No	For each d-dimensional test function, we sample 4d initial points from a Latin hypercube design. The input domain is scaled to [0, 1]d. For GP-based methods, function values are standardized (scaled and centralized) and for Slog GP-based methods, function values are scaled and the centralizing is done in model training. [...] We test the gap between model prediction mean at a random x and its value f(x). [...] We chose four: Skin Segmentation, Bank Note Authentication, Wine Quality, and Breast Cancer. The six hyperparameters are min child weight, colsample bytree, max depth, subsample, alpha, and gamma. Notably, max depth is integer-valued; however, we conduct the search in a continuous space and round it to the nearest integer for evaluation. The objective value is determined by the model s classification error rate on a hold-out dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or other computing resource specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'restart L-BFGS-B in scipy' but does not specify version numbers for scipy or any other software dependencies required to replicate the experiments.
Experiment Setup	Yes	The experimental setting is as follows. For each d-dimensional test function, we sample 4d initial points from a Latin hypercube design. The input domain is scaled to [0, 1]d. For GP-based methods, function values are standardized (scaled and centralized) and for Slog GP-based methods, function values are scaled and the centralizing is done in model training. As kernel we use the squared exponential kernel: K (xa, xb) = σ2 exp xa xb 2 / (2ℓ2) . For acquisition function optimization, we use restart L-BFGS-B in scipy. The restart time and initial samples (restart number 3d and initial sample 30d) and L-BFGS-B options are the same for all acquisition functions. [...] The hyperparameter β of LCB is set to be sqrt(2d log (t2π^2)). [...] Additionally, as discussed in Section 3, the hyperparameters of BABO are set to be (δ1, δ2, δ3) = (0.1, 0.01, 0.252). [...] In practice, a small positive noise variance is typically introduced to ensure numerical stability. Since the estimated signal variance ˆσ2 g in Slog GP can vary substantially, ranging from near-zero to several hundred, we adopt an adaptive noise variance proportional to the estimated signal variance: σ2 noise = 10 5 ˆσ2 g,n 1 The initial noise variance is set to 6 10 6. We maintain these noise parameter settings across all comparative methods to ensure fair comparison.