Understanding High-Dimensional Bayesian Optimization
Authors: Leonard Papenmeier, Matthias Poloczek, Luigi Nardi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve stateof-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Lund University, Lund, Sweden 2Amazon (This research does not relate to Matthias work at Amazon.) 3DBtune. Correspondence to: Leonard Papenmeier <EMAIL>. |
| Pseudocode | No | The paper describes methods and equations for Gaussian Processes and Bayesian Optimization in Appendix A, but it does not present any structured pseudocode or algorithm blocks with numbered steps. |
| Open Source Code | No | The paper references Bo Torch (Balandat et al., 2020) and provides its GitHub link: "URL https://github.com/pytorch/botorch/tree/v0.12.0. Last access: Jan 16, 2025." However, this is a third-party framework used by the authors, not the explicit release of their own implementation code for the MSR method described in the paper. |
| Open Datasets | Yes | Our benchmarks are the 124-dimensional soft-constrained version of the Mopta08 benchmark (Jones, 2008) introduced by Eriksson & Jankowiak (2021), the 180dimensional Lasso-DNA (ˇSehi c et al., 2022), the 388dimensional SVM (Eriksson & Jankowiak, 2021), the 60dimensional Rover (Eriksson et al., 2019), and two 888and 6392-dimensional Mujoco benchmarks used by Hvarfner et al. (2024). |
| Dataset Splits | No | The paper mentions initial sampling for Bayesian Optimization: "each initialized with 10 random samples in the design of experiments (DOE) phase and subsequently optimized with Log EI and RAASP sampling for 20 iterations." This describes how the BO process is initialized with samples, but it does not specify traditional training/test/validation splits for a static dataset, which is what the question is asking for. |
| Hardware Specification | No | The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. This is a general reference to a computing infrastructure but does not provide specific details such as GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | Yes | To optimize the AF, they change Bo Torch s (Balandat et al., 2020) default strategy... The orange distributions in Fig. 6 show the average length scales obtained by MAP with a Gamma(3, 6) prior, which has been the default in Bo Torch before version 12.0... URL https://github.com/pytorch/botorch/tree/v0.12.0. Last access: Jan 16, 2025. |
| Experiment Setup | Yes | We propose a simple initialization for the gradient-based optimizer used for fitting the length scales of the Gaussian process (GP) surrogate via MLE and evaluate its performance for BO tasks. In what follows, we suppose a BO algorithm with a 5/2-ARD-Mat ern kernel and Log EI (Ament et al., 2024). To address the issue of vanishing gradients at the start of the MLE optimization, we choose the initial length scale as 0.1 and scale with d to account for the increasing distances of the randomly sampled design of experiments (DOE) points. Thus, the initial length scale used in the optimization is 0.1 d. |