Embarrassingly Parallel Inference for Gaussian Processes

Authors: Michael Minyi Zhang, Sinead A. Williamson

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We begin by evaluating our method on synthetically generated data, in order to allow us to explore and visualize a range of regimes, and to allow comparison with methods that do not scale to our real-world dataset. In our studies, we will compare our Importance Sampled Mixture of Experts approach (IS-MOE) against a full Gaussian process (GP); three sparse approximations to this model: FITC (Snelson and Ghahramani, 2005), DTC (Seeger et al., 2003), and SVI (Hensman et al., 2013); the Bayesian treed GP (Gramacy and Lee, 2008, BTGP); and the robust Bayesian committee machine (Deisenroth and Ng, 2015, RBCM).
Researcher Affiliation Academia Michael Minyi Zhang EMAIL Department of Computer Science Princeton University Princeton, NJ 08544, USA. Sinead A. Williamson EMAIL Department of Statistics and Data Science Department of Information, Risk and Operations Management The University of Texas at Austin Austin, TX 78712, USA
Pseudocode Yes Algorithm 1: Importance Sampleed Mixture of Experts (IS-MOE) for j = 1, . . . , J in parallel do Draw partition with K clusters of data from P(Z|X) Fit K independent GP models on the partitioned data. Predict new observations on each importance sample with P(f j |Zj, ) = k=1 P(f j |Z j , )P(Z j | ). Obtain weights wj = QK k=1 P(Yk,j|Xk,j, Zj). Normalize weights, wj := wj/ PJ j=1 wj. Average predictions using importance weights: P( f | ) = PJ j=1 wj P(f j |Zj, )
Open Source Code Yes 3. The code is available at https://github.com/michaelzhang01/ISMOE.
Open Datasets Yes We used an empirical dataset consisting of 209,631 mid-tropospheric CO2 measurements over space and time from the Atmospheric Infrared Sounder (AIRS)4. 4. Available in the R package FRK as AIRS 05 2003. ... on three classification datasets from the UCI repository: the Pima Indians diabetes dataset; the Parkinsons dataset; and the Wisconsin diagnostic breast cancer (WDBC) dataset.6 6. All empirical classification datasets are available in the UCI repository at http://archive.ics.uci.edu/ml/.
Dataset Splits Yes We generated a training data set with 1,000 observations and a test set with 100 observations. ... We trained the IS-MOE using a range of values for B, K and J, over 20 crossvalidation splits. ... We evaluated performance over 20 cross-validation splits. ... Our training data contains one million observations and 28 features and a test set of 100,000 observations
Hardware Specification No No specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments are provided. The paper only mentions 'GPU-based computation' as a potential avenue for future research.
Software Dependencies No Our IS-MOE code uses the Gaussian process modules in GPy in Python with parallelization executed through mpi4py (Dalc ın et al., 2005). We ran the full GP, FITC, DTC and SVI implementations also through GPy, BTGP in tgp, and RBCM in gptf.
Experiment Setup Yes All models use a squared exponential covariance matrix. ... For the sparse methods, we used M = 100 inducing points, and for the local methods (including the IS-MOE) we used K = 10 partitions to have a comparable level of computational complexity. For the BTGP we ran the MCMC sampler for 10 iterations; for the IS-MOE we used J = 10 independent importance-weighted samples. ... For IS-MOE, we set J = 100 and B = 1000 and explored a range of values of K; for SVI we chose values for inducing points that gave a comparable level of computational complexity. ... with J = 128, K = 20 and B = 1000.