Spatial meshing for general Bayesian multivariate models
Authors: Michele Peruzzi, David B. Dunson
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demostrate the performance and efficiency improvements of our methods relative to alternatives in extensive synthetic and real world remote sensing and community ecology applications with large scale data at up to hundreds of thousands of spatial locations and up to tens of outcomes. |
| Researcher Affiliation | Academia | Michele Peruzzi EMAIL Department of Biostatistics University of Michigan Ann Arbor, MI 48109-2029, USA David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA |
| Pseudocode | Yes | Algorithm 1 Posterior sampling of spatially meshed model (1) and predictions. ... Algorithm 2 The mth iteration of Simplified Manifold Preconditioner Adaptation. ... Algorithm 3 Posterior sampling and prediction of LMC model (1) with MGP priors. ... Algorithm 4 Posterior sampling of model (12). ... Algorithm 5 Posterior sampling and prediction of LMC model (19) with MGP priors. |
| Open Source Code | Yes | Software for the proposed methods is part of R package meshed, available on CRAN. |
| Open Datasets | Yes | We consider remotely sensed leaf area and snow cover data from the MODerate resolution Imaging Spectroradiometer (MODIS) on the Terra satellite operated by NASA (v.6.1) at 122,500 total locations... The North American Breeding Bird Survey dataset contains avian point count data for more than 700 North American bird taxa |
| Dataset Splits | Yes | We create a test set by introducing missingness in LAI at 10,000 spatial locations, of which 5030 are chosen uniformly at random across the whole domain and 4970 belong to a contiguous rectangular region as displayed on the bottom left subplot of Figure 8a. ... Lastly, we create a test set using 20% of the locations for each outcome (missing data locations differ for each outcome). ... for each of the 4 species, we introduce missingness by independently dropping 20% of the observed count data from the training set uniformly at random |
| Hardware Specification | Yes | In all our applications, all methods are configured to use up to 16 CPU threads in a workstation with 128GB memory and an AMD Ryzen 9 5950X CPU on the Ubuntu 22.04.2 LTS operating system |
| Software Dependencies | Yes | Software for the proposed methods and the related posterior sampling algorithms is available as part of the meshed package for R, available on CRAN. ... All our applications, all methods are configured to use up to 16 CPU threads... using Intel MKL version 2019.5.28 for BLAS/LAPACK. R package meshed (v.0.2) allows one to set the number of Open MP (Dagum and Menon, 1998) threads, whereas Hmsc takes advantage of parallelization via BLAS when performing expensive operations (e.g., chol( )). The R-INLA package used to implement SPDE-INLA methods can similarly take advantage of multithreaded operations. |
| Experiment Setup | Yes | All MCMC-based results are based on chains of length 30,000. All gradient-based methods share the dual averaging setup for adapting the step size ε and are thus allowed the same burn-in period. Finally, we implement an MCMC-free stochastic partial differential equations method (SPDE; Lindgren et al., 2011) fit via INLA. ... We conservatively choose T = 500, a = 1/3, κ = 1/100 as these values allowed ample burn-in time for all spatial nodes in all our applications. ... MCMC for all methods was run for 10,000 iterations, of which the first 5,000 is discarded as burn-in. |