reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Gaussian Processes with Bayesian Inference of Covariate Couplings

Authors: Mattia Rosso, Juho Ylä-Jääski, Zheyang Shen, Markus Heinonen, Maurizio Filippone

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the efficacy and interpretability of this approach. We consider eight UCI datasets as a benchmark to evaluate the performance of gp models for regression and classification tasks. We standardize all datasets to zero mean and unit variance, and report all results with five-fold cross-validation. Following previous works (e.g., Rossi et al. (2021)), we report test mnll for all data, and normalized root mean square error (rmse) for regression and error for classification tasks.
Researcher Affiliation	Academia	Mattia Rosso KAUST, Saudi Arabia Juho Ylä-Jääski Aalto University, Finland Zheyang Shen Newcastle University, UK Markus Heinonen Aalto University, Finland Maurizio Filippone KAUST, Saudi Arabia
Pseudocode	No	The paper describes methods and models using mathematical formulations and descriptive text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce the results on the UCI and Mo Cap datasets can be found at https://github.com/mattyred/Bayesian SGP_Automatic_Coupling_Determination and https: //github.com/mattyred/gaussian-process-odes-acd/ respectively.
Open Datasets	Yes	We consider eight UCI datasets as a benchmark to evaluate the performance of gp models for regression and classification tasks. Table 4: UCI datasets used, including number of datapoints and dimensionalities.
Dataset Splits	Yes	We standardize all datasets to zero mean and unit variance, and report all results with five-fold cross-validation.
Hardware Specification	No	All the experiments were conducted on Google Colab. This statement indicates the environment but lacks specific hardware details such as GPU models or CPU specifications.
Software Dependencies	No	The paper mentions using an 'Adam optimizer' and 'stochastic gradient mcmc' but does not provide specific version numbers for any programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	In all experiments, we chose to approximate gps with 500 inducing points. We ran bsgp for 10,000 iterations with a step-size of 0.01 and mini-batch of 1,000 data points. We evaluate performance on test data from 50 samples collected during training after 1,500 burn-in iterations and using a thinning of 180. We set the default hyperparameter of the number of sghmc steps to K = 10. Table 3: parameter value num. of inducing points 500 mini-batch size 1000 num. iterations 10500 step size 0.01 momentum 0.05 num. of burn-in steps 1500 num. of samples 50 thinning interval 180