reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generic Inference in Latent Gaussian Process Models

Authors: Edwin V. Bonilla, Karl Krauth, Amir Dezfouli

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach quantitatively and qualitatively with experiments on small datasets, medium-scale datasets and large datasets, showing its competitiveness under diﬀerent likelihood models and sparsity levels. On the large-scale experiments involving prediction of airline delays and classiﬁcation of handwritten digits, we show that our method is on par with the state-of-the-art hard-coded approaches for scalable gp regression and classiﬁcation.
Researcher Affiliation	Academia	Edwin V. Bonilla EMAIL Machine Learning Research Group CSIRO s Data61 Sydney NSW 2015, Australia Karl Krauth EMAIL Department of Electrical Engineering and Computer Science University of California Berkeley, CA 94720-1776, USA Amir Dezfouli EMAIL Machine Learning Research Group CSIRO s Data61 Sydney NSW 2015, Australia
Pseudocode	No	The paper describes the inference method through detailed mathematical derivations and explanations, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	We have implemented our savigp method in Python and all the code is publicly available at https://github.com/Karl-Krauth/Sparse-GP.
Open Datasets	Yes	The datasets are summarized in Table 9.3, and are the same as those used by Nguyen and Bonilla (2014a). For example, "boston" and "abalone" refer to (Bache and Lichman, 2013) which points to "UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/ml." The "mnist" dataset and "mnist8m (Loosli et al., 2007)" and "sarcos dataset (Vijayakumar and Schaal, 2000)" are also cited.
Dataset Splits	Yes	The datasets are summarized in Table 9.3... Ntrain, Ntest... For the airline delay prediction, "we selected the ﬁrst 700, 000 data points starting at a given oﬀset as the training set and the next 100, 000 data points as the test set. We generated ﬁve training/test sets by setting the initial oﬀset to 0 and increasing it by 200, 000 each time."
Hardware Specification	Yes	Most of our experiments were either run on g2.2 aws instances, or on a desktop machine with an Intel core i5-4460 cpu, 8GB of ram, and a gtx760 gpu.
Software Dependencies	No	We have implemented our savigp method in Python... an implementation of savigp that uses Theano (Al-Rfou et al., 2016), a library that allows users to deﬁne symbolic mathematical expressions that get compiled to highly optimized gpu cuda code. This text mentions software but lacks specific version numbers for Theano or CUDA.
Experiment Setup	Yes	For optimization in the batch settings, each set of parameters was optimized using l-bfgs, with the maximum number of global iterations limited to 200. In the case of stochastic optimization, we used the adadelta method (Zeiler, 2012) with parameters ϵ = 10 6 and a decay rate of 0.95. We train savigp on the mnist8m dataset by optimizing only variational parameters stochastically, with a batch size of 1000 and 2000 inducing points. Prior mean depths of 200m, 500m, 1600m and 2200m and prior mean velocities of 1950m/s, 2300m/s, 2750m/s and 3650m/s. The corresponding standard deviations for the depths were set to 15% of the layer mean, and for the velocities they were set to 10% of the layer mean. A squared exponential covariance function with unit length-scale was used.