Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees
Authors: William J. Wilkinson, Simo Särkkä, Arno Solin
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our proposed methods, we consider three case studies. These include likelihood models that are a nonlinear function of multiple latent Gaussian processes, and which consistently result in non-PSD covariances when inference is applied na ıvely. ... We compute the negative log predictive density (NLPD) of the test data as the main performance metric. ... Figure 2: Example heteroscedastic noise model results. The top figure compares the posterior obtained for the motorcycle crash data set when using heuristic VI, variational Gauss Newton and variational quasi-Newton. ... Figure 3: Heteroscedastic noise results. Mean of 4-fold cross validation shown. |
| Researcher Affiliation | Academia | William J. Wilkinson EMAIL Department of Computer Science Aalto University Finland; Simo S arkk a EMAIL Department of Electrical Engineering and Automation Aalto University Finland; Arno Solin EMAIL Department of Computer Science Aalto University Finland |
| Pseudocode | Yes | Appendix H. Full Algorithm Descriptions. Here we outline the exact algorithms used to perform inference in GPs, sparse GPs and Markovian GPs. ... Algorithm 1 Gaussian process inference. ... Algorithm 2 Stochastic sparse GP inference. ... Algorithm 3 State space model / Markovian GP inference. |
| Open Source Code | Yes | Python code for the methods and experiments is provided at https://github.com/Aalto ML/Bayes Newton (see Appendix I for instructions on how to reproduce the results). |
| Open Datasets | Yes | The model is applied to data simulating N = 133 accelerometer readings from a motorcycle crash (Silverman, 1985). |
| Dataset Splits | Yes | The model is applied to data simulating N = 133 accelerometer readings from a motorcycle crash (Silverman, 1985). κ1, κ2 are Mat ern-3/2 kernels, and we use a learning rate of ρ = 0.3 and a quasi-Newton damping rate of ξ = 0.5. The data inputs and outputs are scaled to have zero mean and unit variance, and the kernel hyperparameters (lengthscales and variances) are all fixed at the value 1. We use Gauss Hermite integration with 202 = 400 points to solve the intractable integrals required for the VI-, EPand PL-based methods. An example inference result is shown in Figure 2, and the test performance using 4-fold cross validation is shown in Figure 3. ... we plot the 4-fold cross validation results in Figure 5 where we measure both the test NLPD and the RMSE of the posterior mean relative to the ground truth components. ... we remove the middle third of data for two of the three output streams and then compute the NLPD of the removed data as well as the RMSE of the posterior mean relative to the ground truth. |
| Hardware Specification | No | The paper mentions 'computational resources provided by the Aalto Science-IT project' but does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The paper states 'Python code for the methods and experiments is provided' but does not specify Python version or any software library dependencies with their version numbers. |
| Experiment Setup | Yes | To evaluate our proposed methods, we consider three case studies. ... We set the EP power to α = 0.5. ... We have also added a first-order variational inference method as a baseline to each experiment. This is achieved by training the approximate likelihood mean and covariance via gradient-base optimisation of the VFE using the Adam optimiser with a learning rate of 0.1 (we empirically found this to give the best performance and convergence). ... For 8.4.1: we use a learning rate of ρ = 0.3 and a quasi-Newton damping rate of ξ = 0.5. ... For 8.4.2: We use a learning rate of ρ = 0.1 and a quasi-Newton damping rate of ξ = 0.5. ... For 8.4.3: We use a learning rate of ρ = 0.3 and a quasi-Newton damping rate of ξ = 0.3. ... We use Gauss Hermite integration with 202 = 400 points to solve the intractable integrals... ... we use the 5th-order unscented transform (Mc Namee and Stenger, 1967) to approximate them instead of Gauss Hermite. |