New Bounds for Sparse Variational Gaussian Processes

Authors: Michalis Titsias

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On several datasets we demonstrate that our method can reduce bias when learning the hyperparameters and can lead to better predictive performance. The paper includes a dedicated "5. Experiments" section, detailing empirical evaluations on various datasets with performance metrics.
Researcher Affiliation Industry 1Google Deep Mind. Correspondence to: Michalis K. Titsias <EMAIL>.
Pseudocode No The paper describes methods mathematically and textually; no specific pseudocode or algorithm blocks are present.
Open Source Code No The text discusses the source code of a third-party tool or platform that the authors used (GPflow), but does not provide their own implementation code, nor any explicit statement about releasing code or links to a repository.
Open Datasets Yes In the first regression experiment we consider the 1-D Snelson dataset (Snelson & Ghahramani, 2006). To further investigate the findings from the previous section, we consider three medium size real-world UCI regression datasets (Pol, Bike, and Elevators). Secondly, we consider a real dataset (NYBikes) about bicycles crossings going over bridges in New York City2. This dataset is freely available from https: //www.kaggle.com/datasets/new-york-city/ nyc-east-river-bicycle-crossings.
Dataset Splits Yes By following Wang et al. (2019) and Shi et al. (2020) we consider 80% / 20% training / test splits. A 20% subset of the training set is used for validation.
Hardware Specification Yes For the Bike dataset the initial train size (...) is N = 11122 (with d = 17) but since Exact GP training gave out-of-memory error when running in a V100 GPU, we had to slightly reduce the training size to N = 10600.
Software Dependencies No The paper mentions software like GPflow (de G. Matthews et al., 2017) and Adam optimizer but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For the hyperparameters σ2, σ2 f, ℓ2 (or ℓ2 i for ARD kernels) we use the softplus activation to parametrize the square roots of these parameters, i.e., to parametrize σ, σf, ℓi. For all experiments we use the initializations σ = 0.51, σf = 0.69, ℓi = 1.0. The inducing inputs Z are initialized by running at maximum 30 iterations of k-means clustering with the centers initialized at a random training data subset. For training, we perform 10000 optimization iterations using the Adam optimizer with base learning 0.01. Following these settings, for all datasets we train for 100 epochs using Adam with learning rate 0.01 and minibatch size 1024.