Convergence of Sparse Variational Inference in Gaussian Processes Regression
Authors: David R. Burt, Carl Edward Rasmussen, Mark van der Wilk
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we investigate upper and lower bounds on how M needs to grow with N to ensure high quality approximations. We show that we can make the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with M N. Specifically, for the popular squared exponential kernel and D-dimensional Gaussian distributed covariates, M = O((log N)D) suffice and a method with an overall computational cost of O N(log N)2D(log log N)2 can be used to perform inference. ... We provide recommendations on how to select inducing variables in practice, and demonstrate empirical improvements. |
| Researcher Affiliation | Collaboration | David R. Burt EMAIL Carl Edward Rasmussen EMAIL Department of Engineering, University of Cambridge, UK Mark van der Wilk EMAIL Department of Computing, Imperial College London, UK Prowler.io, Cambridge, UK |
| Pseudocode | Yes | Algorithm 1 MCMC algorithm for approximately sampling from an M-DPP (Anari et al., 2016) Input: Training inputs X = {xi}N i=1, number of points to choose, M, kernel k, T number of steps of MCMC to run. Returns: An (approximate) sample from a M-DPP with kernel matrix Kffformed by evaluating k at X. Initialize M columns by greedily selecting columns to maximize the determinant of the resulting submatrix. Call this set of indices of these columns Z0. for τ T do Sample i uniformly from Zτ and j uniformly from X \ Zτ. Define Z = Zτ \ {i} {j}, Compute pi j := 1 2 min{1, det(KZ )/det(KZτ )} With probability pi j, Zτ+1 = Z otherwise, Zτ+1 = Zτ end for Return: ZT |
| Open Source Code | Yes | We provide a GPflowbased (Matthews et al., 2017) implementation of the initialization methods and experiments that builds on other open source software (Coelho, 2017; Virtanen et al., 2020), available at https://github.com/markvdw/Robust GP. |
| Open Datasets | Yes | We consider 3 data sets from the UCI repository that are commonly used in benchmarking regression algorithms, Naval (Ntrain = 10740, Ntest = 1194, D = 14) , Elevators (Ntrain = 14939, Ntest = 1660, D = 18) and Energy (Ntrain = 691, Ntest = 77, D = 8). |
| Dataset Splits | Yes | We consider 3 data sets from the UCI repository that are commonly used in benchmarking regression algorithms, Naval (Ntrain = 10740, Ntest = 1194, D = 14) , Elevators (Ntrain = 14939, Ntest = 1660, D = 18) and Energy (Ntrain = 691, Ntest = 77, D = 8). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments. |
| Software Dependencies | No | We provide a GPflowbased (Matthews et al., 2017) implementation of the initialization methods and experiments that builds on other open source software (Coelho, 2017; Virtanen et al., 2020)... For K-means, we run the Scipy implementation of K-means++ with M centres... We ran 104 steps of L-BFGS... The default Scipy settings. |
| Experiment Setup | Yes | The hyperparameters are set to the optimal values for an exact GP model, or for Naval a sparse GP with 1000 inducing points... For all experiments, we use a squared exponential kernel with automatic relevance determination (ARD), i.e. a separate lengthscale per input dimension... We ran 104 steps of L-BFGS, at which point any improvement was negligible compared to adding more inducing variables. |