Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference
Authors: Xiaoyu Jiang, Sokratia Georgaka, Magnus Rattray, Mauricio A Álvarez
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the performance of the model by benchmarking against some other MOGP models in several real-world datasets, including spatial-temporal climate modelling and spatial transcriptomics. |
| Researcher Affiliation | Academia | Xiaoyu Jiang EMAIL Department of Computer Science University of Manchester, Manchester Sokratia Georgaka EMAIL Faculty of Biology, Medicine and Health University of Manchester, Manchester Magnus Rattray EMAIL Faculty of Biology, Medicine and Health University of Manchester, Manchester Mauricio A. Álvarez EMAIL Department of Computer Science University of Manchester, Manchester |
| Pseudocode | No | The paper describes the Generalized Scalable Latent Variable Multi-Output Gaussian Process (GS-LVMOGP) model and its variational inference approach in detail, including mathematical formulations for the prior distributions, variational distributions, and the evidence lower bound. However, it does not present these procedures in explicitly structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation of our model can be found in https://github.com/Xiaoyu Jiang17/GS-LVMOGP. |
| Open Datasets | Yes | NYC Crimes2014. 2014 2015 crimes reported in all 5 boroughs of new york city. https://www.kaggle.com/adamschroeder/crimes-new-york-city, 2015. Accessed: 2024-05-19. The dataset can be downloaded from https://cds.climate.copernicus.eu/cdsapp#!/dataset/project ions-cmip5-monthly-single-levels?tab=form The United States Historical Climatology Network (USHCN) dataset, as detailed by Menne et al. (2015), includes records of five climate variables across a span of over 150 years at 1, 218 meteorological stations throughout the United States. It is publicly available and can be downloaded at the following address: https://cdiac.ess-dive.lbl.gov/ftp/ushcn_daily/. 10XGenomics2024. 10x genomics. https://www.10xgenomics.com/resources/datasets/human-prost ate-cancer-adenocarcinoma-with-invasive-carcinoma-ffpe-1-standard-1-3-0, 2024. Accessed: 2024-05-19. |
| Dataset Splits | Yes | Results are averages over 5-fold cross-validations with standard deviation. For each output, we randomly select 10 data points from the first 263 observations as training data, using the remaining 253 months for imputation testing. The last 100 observations for each output serve as extrapolation test samples. Our task is to forecast the subsequent three observations following the initial three years of data collection. |
| Hardware Specification | Yes | The experiments are run on a Mac Book Pro with M3 Max and 36 GB of RAM. |
| Software Dependencies | No | The paper mentions using "Adam optimiser (Kingma & Ba, 2014)" and "Scanpy's highly variable genes function (Wolf et al., 2018)" and the use of "Gauss-Hermite quadrature" for numerical integration. However, it does not provide specific version numbers for any software libraries or packages used in the implementation or experiments. |
| Experiment Setup | Yes | Some hyperparameters used in experiments are shown in Table 5. Table 6 provides details on the initialisation of the kernel parameters and the mean of the variational distribution of the latent variables. Table 5: MH refers to the number of inducing points on the latent space, MX refers to the number of inducing points on the input space. QH denotes the dimensionality of the latent space. J is the number of samples used in the Monte Carlo estimation of the integration w.r.t. q(Hd). lr refers to learning rates. Mini-batch size and the number of iterations are also reported. All experiments use Adam optimiser (Kingma & Ba, 2014). Table 6: More details about the initialisation of the kernel parameters and the latent variables. |