reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

New Bounds for Sparse Variational Gaussian Processes

Authors: Michalis Titsias

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On several datasets we demonstrate that our method can reduce bias when learning the hyperparameters and can lead to better predictive performance. The paper includes a dedicated "5. Experiments" section, detailing empirical evaluations on various datasets with performance metrics.
Researcher Affiliation	Industry	1Google Deep Mind. Correspondence to: Michalis K. Titsias <EMAIL>.
Pseudocode	No	The paper describes methods mathematically and textually; no specific pseudocode or algorithm blocks are present.
Open Source Code	No	The text discusses the source code of a third-party tool or platform that the authors used (GPflow), but does not provide their own implementation code, nor any explicit statement about releasing code or links to a repository.
Open Datasets	Yes	In the first regression experiment we consider the 1-D Snelson dataset (Snelson & Ghahramani, 2006). To further investigate the findings from the previous section, we consider three medium size real-world UCI regression datasets (Pol, Bike, and Elevators). Secondly, we consider a real dataset (NYBikes) about bicycles crossings going over bridges in New York City2. This dataset is freely available from https: //www.kaggle.com/datasets/new-york-city/ nyc-east-river-bicycle-crossings.
Dataset Splits	Yes	By following Wang et al. (2019) and Shi et al. (2020) we consider 80% / 20% training / test splits. A 20% subset of the training set is used for validation.
Hardware Specification	Yes	For the Bike dataset the initial train size (...) is N = 11122 (with d = 17) but since Exact GP training gave out-of-memory error when running in a V100 GPU, we had to slightly reduce the training size to N = 10600.
Software Dependencies	No	The paper mentions software like GPflow (de G. Matthews et al., 2017) and Adam optimizer but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For the hyperparameters σ2, σ2 f, ℓ2 (or ℓ2 i for ARD kernels) we use the softplus activation to parametrize the square roots of these parameters, i.e., to parametrize σ, σf, ℓi. For all experiments we use the initializations σ = 0.51, σf = 0.69, ℓi = 1.0. The inducing inputs Z are initialized by running at maximum 30 iterations of k-means clustering with the centers initialized at a random training data subset. For training, we perform 10000 optimization iterations using the Adam optimizer with base learning 0.01. Following these settings, for all datasets we train for 100 epochs using Adam with learning rate 0.01 and minibatch size 1024.