reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Optimization via Continual Variational Last Layer Training

Authors: Paul Brunzema, Mikkel Jordahn, John Willes, Sebastian Trimpe, Jasper Snoek, James Harrison

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose an approach which shows competitive performance on many problem types, including some that BNNs typically struggle with. We build on variational Bayesian last layers (VBLLs), and connect training of these models to exact conditioning in GPs. We exploit this connection to develop an efficient online training algorithm that interleaves conditioning and optimization. Our findings suggest that VBLL networks significantly outperform GPs and other BNN architectures on tasks with complex input correlations, and match the performance of well-tuned GPs on established benchmark tasks. ... Figure 3: Classic benchmarks (top) and high-dimensional and non-stationary benchmarks (bottom). Performance of all surrogates for log EI (top) and TS (bottom). ... Figure 5: Multi-objective benchmarks. Performance of all surrogate models using log EHVI and VBLLs with TS.
Researcher Affiliation	Collaboration	1RWTH Aachen University, 2Technical University of Denmark, 3Vector Institute, 4Google Deep Mind
Pseudocode	Yes	Algorithm 1 VBLL Bayesian Optimization Loop with Continual Variational Last Layer Training
Open Source Code	No	The paper states: "We implement the VBLLs within Bo Torch (Balandat et al., 2020). We further build on the implementation of Li et al. (2024) for the different baselines which are also based on Bo Torch as well as GPy Torch (Gardner et al., 2018)." This refers to the use of third-party frameworks, not the release of the authors' specific implementation code for this paper's methodology.
Open Datasets	Yes	We evaluate the performance of the VBLL surrogate model on various standard benchmarks and three more complex optimization problems... Our results on the 200D NNdraw benchmark (Li et al., 2024), the real-world 25D Pestcontrol benchmark (Oh et al., 2019), and the 12D Lunarlander benchmark (Eriksson et al., 2019) are shown in Figure 3 (bottom). ... Here, we consider the standard benchmarks Branin Currin (D = 2, K = 2), DTLZ1 (D = 5, K = 2), DTLZ2 (D = 5, K = 2), and the real world benchmark Oil Sorbent (D = 7, K = 3) (Wang et al., 2020; Li et al., 2024).
Dataset Splits	No	The paper does not provide specific train/test/validation dataset splits. In Bayesian Optimization, data is acquired sequentially rather than being split from a pre-existing dataset. The paper mentions initial points for benchmarks: "In all subsequent experiments, we select the number of initial points for the single objective benchmarks equal to the input dimensionality D and for the multi-objective benchmarks we use 2 (D + 1) initial points (Daulton et al., 2020; Balandat et al., 2020)." This describes initialization for the BO process, not dataset splits.
Hardware Specification	No	The paper states in the acknowledgments: "Simulations were performed in part with computing resources granted by RWTH Aachen University under projects rwth1579 and p0022034." This is a general statement about computing resources and does not provide specific hardware details (e.g., CPU/GPU models, memory).
Software Dependencies	No	The paper mentions: "We implement the VBLLs within Bo Torch (Balandat et al., 2020)... as well as GPy Torch (Gardner et al., 2018)." and "For all experiments, we use Adam W (Loshchilov & Hutter, 2017) as our optimizer". However, it does not provide specific version numbers for Bo Torch, GPy Torch, or AdamW, which are required for reproducible software dependencies.
Experiment Setup	Yes	For training the VBLL models, we closely follow Harrison et al. (2024). For all experiments, we use Adam W (Loshchilov & Hutter, 2017) as our optimizer with a learning rate of 10-3, set the weight decay for the backbone (not including the parameters of the VBLL) to 10-4, and use norm-based gradient clipping with a value of 1. For the VBLL, we set the prior scale to 1 and the the Wishart scale to 0.01. ... We track the average loss of a training epoch and if this average loss does not improve for a 100 epochs in a row, we stop training and use the model parameters that yielded the lowest training loss.