Compressed Gaussian Process for Manifold Regression

Authors: Rajarshi Guhaniyogi, David B. Dunson

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess the performance of compressed Gaussian process (CGP) regression in a number of simulation examples. We consider various numbers of features (p) and level of noise in the features (τ) to study their impact on the performance. In all the simulations out of sample predictive performance of the proposed CGP regression was compared to that of uncompressed Gaussian process (GP), BART (Bayesian Additive Regression Trees) Chipman et al. (2010), RF (Random Forests) Breiman (2001) and TGP (Treed Gaussian process) Gramacy and Lee (2008). ... Predictive MSE for each of the simulation settings averaged over 50 simulated datasets is shown in Table 2. ... Boxplots for coverage probabilities in all the simulation cases are presented in Figure 2. Figure 3 presents median lengths of the 95% predictive intervals. ... In this section we present an application in which both the dimension and the structure of the underlying manifold is unknown. The dataset consists of 698 images of an artificial face and is referred to as the Isomap face data (Tenenbaum et al., 2000). ... We apply CGP and all the competitors to the dataset to assess relative performances. ... This experiment is repeated 50 times. Table 5 presents MSPE for all the competing methods averaged over 50 experiments along with their standard errors computed using 100 bootstrap samples.
Researcher Affiliation Academia Rajarshi Guhaniyogi EMAIL Department of Applied Mathematics & Statistics University of California Santa Cruz, CA 95064, USA David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA
Pseudocode Yes Algorithm 1 Spectral Clustering Algorithm Input: features x1, ...., xn and the number of clusters required n.clust.
Open Source Code No The paper uses and cites several existing R packages (glmnet, randomForest, Bayes Tree, tgp) for comparative methods, but does not provide a link or explicit statement for the open-sourcing of the Compressed Gaussian Process (CGP) methodology itself.
Open Datasets Yes The dataset consists of 698 images of an artificial face and is referred to as the Isomap face data (Tenenbaum et al., 2000). ... More details about the dataset can be found in http://isomap.stanford.edu/datasets.html.
Dataset Splits Yes We carry out random splitting of the data into n = 648 training cases and npred = 50 test cases and run all the competitors to obtain predictive inference in terms of MSPE, length and coverage of 95% predictive intervals.
Hardware Specification No The paper mentions running R code on a 'standard server' and discusses parallel processing, but does not specify any particular CPU model, GPU model, or other detailed hardware specifications.
Software Dependencies No To implement LASSO, we use glmnet (Friedman et al., 2009) package in R with the optimal tuning parameter selected through 10 fold cross validation. CRF, CBART and CTGP in R using random Forest (Liaw and Wiener, 2002), Bayes Tree (Chipman et al., 2009) and tgp (Gramacy, 2007) packages, respectively. The paper names specific software packages but does not provide their version numbers, nor the version of R used.
Experiment Setup Yes As a default in these analyses, we use m = 60, which seems to be a reasonable choice of upper bound for the dimension of the linear subspace to compress to. ... The number of rows of Φ is fixed at mΦ = 100 for the simulation study with moderately large n. ... To implement LASSO, we use glmnet (Friedman et al., 2009) package in R with the optimal tuning parameter selected through 10 fold cross validation.