reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Spatial meshing for general Bayesian multivariate models

Authors: Michele Peruzzi, David B. Dunson

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demostrate the performance and eﬃciency improvements of our methods relative to alternatives in extensive synthetic and real world remote sensing and community ecology applications with large scale data at up to hundreds of thousands of spatial locations and up to tens of outcomes.
Researcher Affiliation	Academia	Michele Peruzzi EMAIL Department of Biostatistics University of Michigan Ann Arbor, MI 48109-2029, USA David B. Dunson EMAIL Department of Statistical Science Duke University Durham, NC 27708-0251, USA
Pseudocode	Yes	Algorithm 1 Posterior sampling of spatially meshed model (1) and predictions. ... Algorithm 2 The mth iteration of Simpliﬁed Manifold Preconditioner Adaptation. ... Algorithm 3 Posterior sampling and prediction of LMC model (1) with MGP priors. ... Algorithm 4 Posterior sampling of model (12). ... Algorithm 5 Posterior sampling and prediction of LMC model (19) with MGP priors.
Open Source Code	Yes	Software for the proposed methods is part of R package meshed, available on CRAN.
Open Datasets	Yes	We consider remotely sensed leaf area and snow cover data from the MODerate resolution Imaging Spectroradiometer (MODIS) on the Terra satellite operated by NASA (v.6.1) at 122,500 total locations... The North American Breeding Bird Survey dataset contains avian point count data for more than 700 North American bird taxa
Dataset Splits	Yes	We create a test set by introducing missingness in LAI at 10,000 spatial locations, of which 5030 are chosen uniformly at random across the whole domain and 4970 belong to a contiguous rectangular region as displayed on the bottom left subplot of Figure 8a. ... Lastly, we create a test set using 20% of the locations for each outcome (missing data locations diﬀer for each outcome). ... for each of the 4 species, we introduce missingness by independently dropping 20% of the observed count data from the training set uniformly at random
Hardware Specification	Yes	In all our applications, all methods are conﬁgured to use up to 16 CPU threads in a workstation with 128GB memory and an AMD Ryzen 9 5950X CPU on the Ubuntu 22.04.2 LTS operating system
Software Dependencies	Yes	Software for the proposed methods and the related posterior sampling algorithms is available as part of the meshed package for R, available on CRAN. ... All our applications, all methods are conﬁgured to use up to 16 CPU threads... using Intel MKL version 2019.5.28 for BLAS/LAPACK. R package meshed (v.0.2) allows one to set the number of Open MP (Dagum and Menon, 1998) threads, whereas Hmsc takes advantage of parallelization via BLAS when performing expensive operations (e.g., chol( )). The R-INLA package used to implement SPDE-INLA methods can similarly take advantage of multithreaded operations.
Experiment Setup	Yes	All MCMC-based results are based on chains of length 30,000. All gradient-based methods share the dual averaging setup for adapting the step size ε and are thus allowed the same burn-in period. Finally, we implement an MCMC-free stochastic partial diﬀerential equations method (SPDE; Lindgren et al., 2011) ﬁt via INLA. ... We conservatively choose T = 500, a = 1/3, κ = 1/100 as these values allowed ample burn-in time for all spatial nodes in all our applications. ... MCMC for all methods was run for 10,000 iterations, of which the ﬁrst 5,000 is discarded as burn-in.