Bayesian Optimization with Informative Covariance
Authors: Afonso Eduardo, Michael U. Gutmann
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5, we empirically demonstrate that the proposed methodology can increase the sample efficiency of BO in high-dimensional domains, even under weak prior information. Additionally, we show that it can complement existing methodologies, including GP models with informative mean functions, trust-region optimization and belief-augmented acquisition functions. |
| Researcher Affiliation | Academia | Afonso Eduardo EMAIL School of Informatics University of Edinburgh Michael U. Gutmann EMAIL School of Informatics University of Edinburgh |
| Pseudocode | Yes | Algorithm 1 Bayesian Optimization (BO) Input: objective function f, acquisition function α, statistical model M, initial evidence Dn0 repeat xn+1 = arg max α(x | Dn, M) Find best candidate yn+1 = f(xn+1) Evaluate candidate Dn+1 = Dn {(xn+1, yn+1)} Update evidence set until stopping condition is met |
| Open Source Code | No | We used GPy Torch (Gardner et al., 2018) to implement all GP models in Python, including the proposed informative (I) covariance functions, cylindrical (C) covariance functions (Oh et al., 2018) and axis-aligned quadratic mean (+QM) (Snoek et al., 2015). Unlike the version provided by GPy Torch, our implementation of C follows the original training and prediction routines, which account for the special treatment of the origin, as discussed in more detail in Appendix C. The paper does not provide an explicit statement of code release or a link to their specific implementation. |
| Open Datasets | Yes | In this application, we optimize a neural network layer, as proposed by Oh et al. (2018). In particular, we perform BO of 100 parameters of a two-layer fully-connected network trained and tested on the MNIST dataset (Le Cun, 1998). |
| Dataset Splits | Yes | The model is trained for 10 epochs with a batch size of 512 and evaluated on the test set using the negative log-likelihood loss, while the remaining parameters are learned on the training set with Adam (Kingma & Ba, 2014). |
| Hardware Specification | No | Maximization is performed on the CPU via gradient-based optimization with multiple restarts. (This only mentions CPU generally, without specific models or quantities.) |
| Software Dependencies | No | We used GPy Torch (Gardner et al., 2018) to implement all GP models in Python... The default acquisition function is the Expected Improvement (EI), which is implemented in Bo Torch (Balandat et al., 2020)... In terms of implementation, we use pycma (Hansen et al., 2019)... In terms of implementation, the function call that performs interpolation, scipy.interpolate.splprep (Virtanen et al., 2020)... (Only SciPy 1.0 explicitly includes a version number in its reference title, but not the other key software components used.) |
| Experiment Setup | Yes | all models are optimized by L-BFGS-B (Zhu et al., 1997) with a maximum of 1000 iterations per new acquisition and other options set to default values. In addition, the stationary components CS are Matérn (ν = 5/2)... General hyperparameters are given uninformative priors with bounds from (Oh et al., 2018). These include the prior variance σ2 0 U(e 12, e20) and lengthscales λd U(e 12, 2 D), d {1 . . . D}... For noise-free objectives, the noise hyperparameter is set to a fixed value σ2 y = 10 3. |