A Bayesian Contiguous Partitioning Method for Learning Clustered Latent Variables

Authors: Zhao Tang Luo, Huiyan Sang, Bani Mallick

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate the performance of the model with simulation studies and a real data analysis of detecting the temperature-salinity relationship from water masses in the Atlantic Ocean.
Researcher Affiliation Academia Zhao Tang Luo EMAIL Huiyan Sang EMAIL Bani Mallick EMAIL Department of Statistics Texas A&M University College Station, TX 77840, USA
Pseudocode Yes The RJ-MCMC algorithm is summarized in Algorithm 1 and detailed in Appendix B.
Open Source Code No The code will be made publicly available upon publication.
Open Datasets Yes The data of temperature and salinity is downloaded from National Oceanographic Data Center (https://www.nodc.noaa.gov/OC5/woa13/).
Dataset Splits No The paper describes generating 1000 spatial locations for simulation studies and choosing a random sample of 5,130 spatial locations for real data analysis, but it does not specify any training/test/validation splits for these datasets.
Hardware Specification Yes All computations were performed on a Linux server with two 2.4GHz 14-core processors and 64GB of memory.
Software Dependencies No We implement the BSCC method in R using the deldir package for the Delaunay triangulation, the igraph package for graph operations, and the ramcmc package for the Cholesky update/downdate. The implementation of the SCC method is adapted from the R package glmnet. The DPM model is implemented in R using the nimble code provided in Ma et al. (2020).
Experiment Setup Yes We consider four candidates α = 0.0075, 0.0150, 0.1000, 0.3333, which give c = 0.05, 0.1, 0.5, 0.9, respectively. The other hyperparameters are set to be a0 = b0 = 1 and c0 = d0 = 10 6, and the standard deviation for the random walk proposal in the hyper step of our RJ-MCMC algorithm is chosen to be 0.9. For each simulated data set, we run d = 8 tempered chains in parallel with the lowest inverse temperature td = 0.35. We run each chain for 100, 000 iterations, discarding the first 50, 000. We set the thinning interval to be 20 iterations and the swap interval to be 100.