reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding High-Dimensional Bayesian Optimization

Authors: Leonard Papenmeier, Matthias Poloczek, Luigi Nardi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve stateof-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings.
Researcher Affiliation	Collaboration	1Department of Computer Science, Lund University, Lund, Sweden 2Amazon (This research does not relate to Matthias work at Amazon.) 3DBtune. Correspondence to: Leonard Papenmeier <EMAIL>.
Pseudocode	No	The paper describes methods and equations for Gaussian Processes and Bayesian Optimization in Appendix A, but it does not present any structured pseudocode or algorithm blocks with numbered steps.
Open Source Code	No	The paper references Bo Torch (Balandat et al., 2020) and provides its GitHub link: "URL https://github.com/pytorch/botorch/tree/v0.12.0. Last access: Jan 16, 2025." However, this is a third-party framework used by the authors, not the explicit release of their own implementation code for the MSR method described in the paper.
Open Datasets	Yes	Our benchmarks are the 124-dimensional soft-constrained version of the Mopta08 benchmark (Jones, 2008) introduced by Eriksson & Jankowiak (2021), the 180dimensional Lasso-DNA (ˇSehi c et al., 2022), the 388dimensional SVM (Eriksson & Jankowiak, 2021), the 60dimensional Rover (Eriksson et al., 2019), and two 888and 6392-dimensional Mujoco benchmarks used by Hvarfner et al. (2024).
Dataset Splits	No	The paper mentions initial sampling for Bayesian Optimization: "each initialized with 10 random samples in the design of experiments (DOE) phase and subsequently optimized with Log EI and RAASP sampling for 20 iterations." This describes how the BO process is initialized with samples, but it does not specify traditional training/test/validation splits for a static dataset, which is what the question is asking for.
Hardware Specification	No	The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. This is a general reference to a computing infrastructure but does not provide specific details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies	Yes	To optimize the AF, they change Bo Torch s (Balandat et al., 2020) default strategy... The orange distributions in Fig. 6 show the average length scales obtained by MAP with a Gamma(3, 6) prior, which has been the default in Bo Torch before version 12.0... URL https://github.com/pytorch/botorch/tree/v0.12.0. Last access: Jan 16, 2025.
Experiment Setup	Yes	We propose a simple initialization for the gradient-based optimizer used for fitting the length scales of the Gaussian process (GP) surrogate via MLE and evaluate its performance for BO tasks. In what follows, we suppose a BO algorithm with a 5/2-ARD-Mat ern kernel and Log EI (Ament et al., 2024). To address the issue of vanishing gradients at the start of the MLE optimization, we choose the initial length scale as 0.1 and scale with d to account for the increasing distances of the randomly sampled design of experiments (DOE) points. Thus, the initial length scale used in the optimization is 0.1 d.