reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cluster-Specific Predictions with Multi-Task Gaussian Processes

Authors: Arthur Leroy, Pierre Latouche, Benjamin Guedj, Servane Gey

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real data sets. The overall algorithm, called Magma Clust, is publicly available as an R package. Keywords: Gaussian processes mixture, curve clustering, multi-task learning, variational EM, cluster-speciﬁc predictions
Researcher Affiliation	Academia	Arthur Leroy EMAIL Department of Computer Science, The University of Manchester, Manchester, United Kingdom Pierre Latouche EMAIL Universit e Paris Cit e, CNRS, MAP5 UMR 8145 F-75006 Paris, France; Mines Paris Tech, Centre de G eosciences, PSL Research University, F-77300 Fontainebleau, France; Universit e Clermont Auvergne, CNRS, LMBP, UMR 6620, Aubi ere, France Benjamin Guedj EMAIL Centre for Artiﬁcial Intelligence and Department of Computer Science, University College London & Inria London, United Kingdom and France Servane Gey EMAIL Universit e Paris Cit e, CNRS, MAP5 UMR 8145 F-75006 Paris, France
Pseudocode	Yes	Algorithm 1 Magma Clust: Variational EM algorithm Initialise {mk(t)}k, Θ = {γk}k , {θi}i , σ2 i i and {τini i }i (or π). while not converged do E step: Optimise L(q; Θ) w.r.t. q( ): bq Z(Z) = M Q i=1 M(Zi; 1, τi). bqµ(µ) = KQ k=1 N(µk(t); bmk(t), b Ct k). M step: Optimise L(q; Θ) w.r.t. Θ: bΘ = argmax Θ EZ,µ [ log p({yi}i , Z, µ \| Θ) ] . end while return bΘ, {τi}i { bmk(t)}k, {b Ct k}k.
Open Source Code	Yes	The overall algorithm, called Magma Clust, is publicly available as an R package. ... The current version of the R package implementing Magma Clust is available on the CRAN and at https://github.com/Arthur Leroy/Magma Clust R.
Open Datasets	Yes	The synthetic data, trained models and results are available at https://github.com/ Arthur Leroy/MAGMAclust/tree/master/Simulations. The real data sets, associated trained models and results are available at https://github.com/Arthur Leroy/MAGMAclust/tree/ master/Real_Data_Study. ... 100m freestyle swimming data sets initially proposed in Leroy et al. (2018) and Leroy et al. (2022) is analysed below ... This data set (collected through the GUSTO program, see https://www.gusto.sg/) corresponds to a weight follow-up of 342 children ... historical measurements of CO2 emissions per capita for each country from 1750 to 2020 (freely available at https://github.com/owid/co2-data).
Dataset Splits	Yes	In the context of prediction, a new individual is generated according to the same scheme, although its ﬁrst 20 data points are assumed to be observed while the remaining 10 serve as testing values. ... For all experiments, the individuals (or countries for CO2) are split into training and testing sets (in proportions 60% 40%). In the absence of expert knowledge, the prior mean functions {mk( )}k are set to be constant equal to 0. ... Then, the data points of each testing individual are split for evaluation purposes between observed (the ﬁrst 60%) and testing values (the remaining 40%).
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running experiments. The paper discusses an optimization algorithm (L-BFGS-B) and software packages but not the underlying hardware.
Software Dependencies	No	The paper mentions "R package" and "The Multi-Output Gaussian Process Tool Kit (MOGPTK)" as software used, but does not provide specific version numbers for these dependencies to ensure reproducibility.
Experiment Setup	Yes	Throughout, the exponentiated quadratic (EQ) kernel, as deﬁned in Equation (1), serves as covariance structure for both generating data and modelling. ... The mean functions {mk( )}k are set to be 0 in Magma Clust, as usual for GPs, whereas the membership probabilities τik are initialised thanks to a preliminary k-means algorithm. ... In the absence of expert knowledge, the prior mean functions {mk( )}k are set to be constant equal to 0. The hypothesis H00 is speciﬁed along with random initialisations for the hyper-parameters.