Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Gaussian Processes with Bayesian Inference of Covariate Couplings
Authors: Mattia Rosso, Juho Ylä-Jääski, Zheyang Shen, Markus Heinonen, Maurizio Filippone
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the efficacy and interpretability of this approach. We consider eight UCI datasets as a benchmark to evaluate the performance of gp models for regression and classification tasks. We standardize all datasets to zero mean and unit variance, and report all results with five-fold cross-validation. Following previous works (e.g., Rossi et al. (2021)), we report test mnll for all data, and normalized root mean square error (rmse) for regression and error for classification tasks. |
| Researcher Affiliation | Academia | Mattia Rosso KAUST, Saudi Arabia Juho Ylä-Jääski Aalto University, Finland Zheyang Shen Newcastle University, UK Markus Heinonen Aalto University, Finland Maurizio Filippone KAUST, Saudi Arabia |
| Pseudocode | No | The paper describes methods and models using mathematical formulations and descriptive text but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce the results on the UCI and Mo Cap datasets can be found at https://github.com/mattyred/Bayesian SGP_Automatic_Coupling_Determination and https: //github.com/mattyred/gaussian-process-odes-acd/ respectively. |
| Open Datasets | Yes | We consider eight UCI datasets as a benchmark to evaluate the performance of gp models for regression and classification tasks. Table 4: UCI datasets used, including number of datapoints and dimensionalities. |
| Dataset Splits | Yes | We standardize all datasets to zero mean and unit variance, and report all results with five-fold cross-validation. |
| Hardware Specification | No | All the experiments were conducted on Google Colab. This statement indicates the environment but lacks specific hardware details such as GPU models or CPU specifications. |
| Software Dependencies | No | The paper mentions using an 'Adam optimizer' and 'stochastic gradient mcmc' but does not provide specific version numbers for any programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | In all experiments, we chose to approximate gps with 500 inducing points. We ran bsgp for 10,000 iterations with a step-size of 0.01 and mini-batch of 1,000 data points. We evaluate performance on test data from 50 samples collected during training after 1,500 burn-in iterations and using a thinning of 180. We set the default hyperparameter of the number of sghmc steps to K = 10. Table 3: parameter value num. of inducing points 500 mini-batch size 1000 num. iterations 10500 step size 0.01 momentum 0.05 num. of burn-in steps 1500 num. of samples 50 thinning interval 180 |