reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gauss-Legendre Features for Gaussian Process Regression

Authors: Paz Fink Shustin, Haim Avron

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we report experiments evaluating the performance of our proposed quadrature based approach. Our goal is to show that indeed if U and s are set to be large enough, our method yields results that are essentially indistinguishable from using the exact kernel while oﬀering faster hyperparameter learning, training and prediction. Clearly, from the theoretical results, our method predominately applies to low-dimensional datasets (for example, such datasets are prevalent in spatial statistics), so we experiment with one dimensional and two dimensional datasets. We experiment both with the Gaussian kernel and the Matèrn kernel. In the graphs, we label our method as GLF-GPR (standing for Gauss-Legendre Features Gaussian Process Regression). We use the following methods as a benchmark: exact GPR (labeled in the graphs as Exact-GPR), GPR based on random Fourier features (labeled RFF-GPR), and GPR based on modiﬁed random Fourier features (labeled MRF-GPR).
Researcher Affiliation	Academia	Paz Fink Shustin EMAIL Department of Applied Mathematics Tel Aviv University Tel Aviv, 69978, Israel Haim Avron EMAIL Department of Applied Mathematics Tel Aviv University Tel Aviv, 69978, Israel
Pseudocode	No	The paper describes methods and derivations in prose and mathematical equations but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "The various methods were implemented in MATLAB." However, it does not provide any explicit statement about making the code publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	Next we consider the natural sound benchmark used in (Wilson and Nickisch, 2015) (without hyperparameter learning) and (Dong et al., 2017) (with hyperparameter learning). ... Next we consider a two dimensional function: f 2 (x1, x2) = (sin(x1) + sin(10ex1))(sin(x2) + sin(10ex2)) . The function was sampled on an uniform grid on [ 1, 1] [ 1, 1] with n = 4096 samples. ... We consider a time series data of the daily high stock price of Google spanning 3797 days from 19th August 2004 to 19th September 2019. ... Similar to (Ton et al., 2018), we consider MOD11A2 Land Surface Temperature (LST) 8-day composite 2D data of synoptic yearly mean for 2016 in the East Africa region.
Dataset Splits	Yes	The function was sampled equidistantly on [ 1, 1] with n = 800 samples. ... The function was sampled on an uniform grid on [ 1, 1] [ 1, 1] with n = 4096 samples. ... The test is of size of 12% of the data, i.e., consists of 502 days. ... For the training set, we randomly sample 77404 LST locations and set x {(Longitude, Latitude)} and y = {temperature}.We examine the MSE errors on the remaining 6005 locations but use all 83409 data points to draw maps.
Hardware Specification	Yes	Running times were measured on a machine with two 3.2GHz Intel(R) Xeon(R) Gold 6134 CPUs, each having 8 cores, and 256GB RAM.
Software Dependencies	No	The paper states: "The various methods were implemented in MATLAB." However, it does not specify a version number for MATLAB or any other libraries or software used.
Experiment Setup	Yes	Optimizing the hyperparameters was conducted using the MATLAB function fmincon after transforming the hyperparameters to a logarithmic scale. For each problem, we deﬁned a hyperparameter domain, e.g. Θ = {[ℓ, σ2 n, σ2 f] : ℓ0 ℓ ℓ1, σ2 n0 σ2 n σ2 n1, σ2 f1 σ2 f σ2 f0} and we take the initial hyperparameters for the optimization to be [ℓ0, σ2 f0, σ2 n0]. ... The data is generated by noisily sampling a predetermined function, i.e. samples are generated from the formula yi = f (xi) + τi where f is the true function and {τi} are i.i.d noise terms, distributed as normal variables with variance σ2 τ = 0.52 (for 1D) or σ2 τ = 0.32I2 (for 2D). In these experiments we use the isotropic Gaussian kernel. ... We use the Matèrn kernel with ν = 5/2. ... We also use the anisotropic Matèrn kernel with ν = 1.