New Bounds for Sparse Variational Gaussian Processes
Authors: Michalis Titsias
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On several datasets we demonstrate that our method can reduce bias when learning the hyperparameters and can lead to better predictive performance. The paper includes a dedicated "5. Experiments" section, detailing empirical evaluations on various datasets with performance metrics. |
| Researcher Affiliation | Industry | 1Google Deep Mind. Correspondence to: Michalis K. Titsias <EMAIL>. |
| Pseudocode | No | The paper describes methods mathematically and textually; no specific pseudocode or algorithm blocks are present. |
| Open Source Code | No | The text discusses the source code of a third-party tool or platform that the authors used (GPflow), but does not provide their own implementation code, nor any explicit statement about releasing code or links to a repository. |
| Open Datasets | Yes | In the first regression experiment we consider the 1-D Snelson dataset (Snelson & Ghahramani, 2006). To further investigate the findings from the previous section, we consider three medium size real-world UCI regression datasets (Pol, Bike, and Elevators). Secondly, we consider a real dataset (NYBikes) about bicycles crossings going over bridges in New York City2. This dataset is freely available from https: //www.kaggle.com/datasets/new-york-city/ nyc-east-river-bicycle-crossings. |
| Dataset Splits | Yes | By following Wang et al. (2019) and Shi et al. (2020) we consider 80% / 20% training / test splits. A 20% subset of the training set is used for validation. |
| Hardware Specification | Yes | For the Bike dataset the initial train size (...) is N = 11122 (with d = 17) but since Exact GP training gave out-of-memory error when running in a V100 GPU, we had to slightly reduce the training size to N = 10600. |
| Software Dependencies | No | The paper mentions software like GPflow (de G. Matthews et al., 2017) and Adam optimizer but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For the hyperparameters σ2, σ2 f, ℓ2 (or ℓ2 i for ARD kernels) we use the softplus activation to parametrize the square roots of these parameters, i.e., to parametrize σ, σf, ℓi. For all experiments we use the initializations σ = 0.51, σf = 0.69, ℓi = 1.0. The inducing inputs Z are initialized by running at maximum 30 iterations of k-means clustering with the centers initialized at a random training data subset. For training, we perform 10000 optimization iterations using the Adam optimizer with base learning 0.01. Following these settings, for all datasets we train for 100 epochs using Adam with learning rate 0.01 and minibatch size 1024. |