Scalable Multi-Output Gaussian Processes with Stochastic Variational Inference

Authors: Xiaoyu Jiang, Sokratia Georgaka, Magnus Rattray, Mauricio A Álvarez

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the performance of the model by benchmarking against some other MOGP models in several real-world datasets, including spatial-temporal climate modelling and spatial transcriptomics.
Researcher Affiliation Academia Xiaoyu Jiang EMAIL Department of Computer Science University of Manchester, Manchester Sokratia Georgaka EMAIL Faculty of Biology, Medicine and Health University of Manchester, Manchester Magnus Rattray EMAIL Faculty of Biology, Medicine and Health University of Manchester, Manchester Mauricio A. Álvarez EMAIL Department of Computer Science University of Manchester, Manchester
Pseudocode No The paper describes the Generalized Scalable Latent Variable Multi-Output Gaussian Process (GS-LVMOGP) model and its variational inference approach in detail, including mathematical formulations for the prior distributions, variational distributions, and the evidence lower bound. However, it does not present these procedures in explicitly structured pseudocode or algorithm blocks.
Open Source Code Yes The implementation of our model can be found in https://github.com/Xiaoyu Jiang17/GS-LVMOGP.
Open Datasets Yes NYC Crimes2014. 2014 2015 crimes reported in all 5 boroughs of new york city. https://www.kaggle.com/adamschroeder/crimes-new-york-city, 2015. Accessed: 2024-05-19. The dataset can be downloaded from https://cds.climate.copernicus.eu/cdsapp#!/dataset/project ions-cmip5-monthly-single-levels?tab=form The United States Historical Climatology Network (USHCN) dataset, as detailed by Menne et al. (2015), includes records of five climate variables across a span of over 150 years at 1, 218 meteorological stations throughout the United States. It is publicly available and can be downloaded at the following address: https://cdiac.ess-dive.lbl.gov/ftp/ushcn_daily/. 10XGenomics2024. 10x genomics. https://www.10xgenomics.com/resources/datasets/human-prost ate-cancer-adenocarcinoma-with-invasive-carcinoma-ffpe-1-standard-1-3-0, 2024. Accessed: 2024-05-19.
Dataset Splits Yes Results are averages over 5-fold cross-validations with standard deviation. For each output, we randomly select 10 data points from the first 263 observations as training data, using the remaining 253 months for imputation testing. The last 100 observations for each output serve as extrapolation test samples. Our task is to forecast the subsequent three observations following the initial three years of data collection.
Hardware Specification Yes The experiments are run on a Mac Book Pro with M3 Max and 36 GB of RAM.
Software Dependencies No The paper mentions using "Adam optimiser (Kingma & Ba, 2014)" and "Scanpy's highly variable genes function (Wolf et al., 2018)" and the use of "Gauss-Hermite quadrature" for numerical integration. However, it does not provide specific version numbers for any software libraries or packages used in the implementation or experiments.
Experiment Setup Yes Some hyperparameters used in experiments are shown in Table 5. Table 6 provides details on the initialisation of the kernel parameters and the mean of the variational distribution of the latent variables. Table 5: MH refers to the number of inducing points on the latent space, MX refers to the number of inducing points on the input space. QH denotes the dimensionality of the latent space. J is the number of samples used in the Monte Carlo estimation of the integration w.r.t. q(Hd). lr refers to learning rates. Mini-batch size and the number of iterations are also reported. All experiments use Adam optimiser (Kingma & Ba, 2014). Table 6: More details about the initialisation of the kernel parameters and the latent variables.