Cluster-Specific Predictions with Multi-Task Gaussian Processes
Authors: Arthur Leroy, Pierre Latouche, Benjamin Guedj, Servane Gey
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real data sets. The overall algorithm, called Magma Clust, is publicly available as an R package. Keywords: Gaussian processes mixture, curve clustering, multi-task learning, variational EM, cluster-specific predictions |
| Researcher Affiliation | Academia | Arthur Leroy EMAIL Department of Computer Science, The University of Manchester, Manchester, United Kingdom Pierre Latouche EMAIL Universit e Paris Cit e, CNRS, MAP5 UMR 8145 F-75006 Paris, France; Mines Paris Tech, Centre de G eosciences, PSL Research University, F-77300 Fontainebleau, France; Universit e Clermont Auvergne, CNRS, LMBP, UMR 6620, Aubi ere, France Benjamin Guedj EMAIL Centre for Artificial Intelligence and Department of Computer Science, University College London & Inria London, United Kingdom and France Servane Gey EMAIL Universit e Paris Cit e, CNRS, MAP5 UMR 8145 F-75006 Paris, France |
| Pseudocode | Yes | Algorithm 1 Magma Clust: Variational EM algorithm Initialise {mk(t)}k, Θ = {γk}k , {θi}i , σ2 i i and {τini i }i (or π). while not converged do E step: Optimise L(q; Θ) w.r.t. q( ): bq Z(Z) = M Q i=1 M(Zi; 1, τi). bqµ(µ) = KQ k=1 N(µk(t); bmk(t), b Ct k). M step: Optimise L(q; Θ) w.r.t. Θ: bΘ = argmax Θ EZ,µ [ log p({yi}i , Z, µ | Θ) ] . end while return bΘ, {τi}i { bmk(t)}k, {b Ct k}k. |
| Open Source Code | Yes | The overall algorithm, called Magma Clust, is publicly available as an R package. ... The current version of the R package implementing Magma Clust is available on the CRAN and at https://github.com/Arthur Leroy/Magma Clust R. |
| Open Datasets | Yes | The synthetic data, trained models and results are available at https://github.com/ Arthur Leroy/MAGMAclust/tree/master/Simulations. The real data sets, associated trained models and results are available at https://github.com/Arthur Leroy/MAGMAclust/tree/ master/Real_Data_Study. ... 100m freestyle swimming data sets initially proposed in Leroy et al. (2018) and Leroy et al. (2022) is analysed below ... This data set (collected through the GUSTO program, see https://www.gusto.sg/) corresponds to a weight follow-up of 342 children ... historical measurements of CO2 emissions per capita for each country from 1750 to 2020 (freely available at https://github.com/owid/co2-data). |
| Dataset Splits | Yes | In the context of prediction, a new individual is generated according to the same scheme, although its first 20 data points are assumed to be observed while the remaining 10 serve as testing values. ... For all experiments, the individuals (or countries for CO2) are split into training and testing sets (in proportions 60% 40%). In the absence of expert knowledge, the prior mean functions {mk( )}k are set to be constant equal to 0. ... Then, the data points of each testing individual are split for evaluation purposes between observed (the first 60%) and testing values (the remaining 40%). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running experiments. The paper discusses an optimization algorithm (L-BFGS-B) and software packages but not the underlying hardware. |
| Software Dependencies | No | The paper mentions "R package" and "The Multi-Output Gaussian Process Tool Kit (MOGPTK)" as software used, but does not provide specific version numbers for these dependencies to ensure reproducibility. |
| Experiment Setup | Yes | Throughout, the exponentiated quadratic (EQ) kernel, as defined in Equation (1), serves as covariance structure for both generating data and modelling. ... The mean functions {mk( )}k are set to be 0 in Magma Clust, as usual for GPs, whereas the membership probabilities τik are initialised thanks to a preliminary k-means algorithm. ... In the absence of expert knowledge, the prior mean functions {mk( )}k are set to be constant equal to 0. The hypothesis H00 is specified along with random initialisations for the hyper-parameters. |