Convergence of Sparse Variational Inference in Gaussian Processes Regression

Authors: David R. Burt, Carl Edward Rasmussen, Mark van der Wilk

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we investigate upper and lower bounds on how M needs to grow with N to ensure high quality approximations. We show that we can make the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with M N. Specifically, for the popular squared exponential kernel and D-dimensional Gaussian distributed covariates, M = O((log N)D) suffice and a method with an overall computational cost of O N(log N)2D(log log N)2 can be used to perform inference. ... We provide recommendations on how to select inducing variables in practice, and demonstrate empirical improvements.
Researcher Affiliation Collaboration David R. Burt EMAIL Carl Edward Rasmussen EMAIL Department of Engineering, University of Cambridge, UK Mark van der Wilk EMAIL Department of Computing, Imperial College London, UK Prowler.io, Cambridge, UK
Pseudocode Yes Algorithm 1 MCMC algorithm for approximately sampling from an M-DPP (Anari et al., 2016) Input: Training inputs X = {xi}N i=1, number of points to choose, M, kernel k, T number of steps of MCMC to run. Returns: An (approximate) sample from a M-DPP with kernel matrix Kffformed by evaluating k at X. Initialize M columns by greedily selecting columns to maximize the determinant of the resulting submatrix. Call this set of indices of these columns Z0. for τ T do Sample i uniformly from Zτ and j uniformly from X \ Zτ. Define Z = Zτ \ {i} {j}, Compute pi j := 1 2 min{1, det(KZ )/det(KZτ )} With probability pi j, Zτ+1 = Z otherwise, Zτ+1 = Zτ end for Return: ZT
Open Source Code Yes We provide a GPflowbased (Matthews et al., 2017) implementation of the initialization methods and experiments that builds on other open source software (Coelho, 2017; Virtanen et al., 2020), available at https://github.com/markvdw/Robust GP.
Open Datasets Yes We consider 3 data sets from the UCI repository that are commonly used in benchmarking regression algorithms, Naval (Ntrain = 10740, Ntest = 1194, D = 14) , Elevators (Ntrain = 14939, Ntest = 1660, D = 18) and Energy (Ntrain = 691, Ntest = 77, D = 8).
Dataset Splits Yes We consider 3 data sets from the UCI repository that are commonly used in benchmarking regression algorithms, Naval (Ntrain = 10740, Ntest = 1194, D = 14) , Elevators (Ntrain = 14939, Ntest = 1660, D = 18) and Energy (Ntrain = 691, Ntest = 77, D = 8).
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments.
Software Dependencies No We provide a GPflowbased (Matthews et al., 2017) implementation of the initialization methods and experiments that builds on other open source software (Coelho, 2017; Virtanen et al., 2020)... For K-means, we run the Scipy implementation of K-means++ with M centres... We ran 104 steps of L-BFGS... The default Scipy settings.
Experiment Setup Yes The hyperparameters are set to the optimal values for an exact GP model, or for Naval a sparse GP with 1000 inducing points... For all experiments, we use a squared exponential kernel with automatic relevance determination (ARD), i.e. a separate lengthscale per input dimension... We ran 104 steps of L-BFGS, at which point any improvement was negligible compared to adding more inducing variables.