Graph-accelerated Markov Chain Monte Carlo using Approximate Samples

Authors: Leo L. Duan, Anirban Bhattacharya

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate improved mixing performances for challenging problems, such as those involving multiple modes, non-convex density contour, or large-dimension latent variables. Section 4. Simulations Section 5. Application: Estimating Latent Gaussian Model for Power Outage Data
Researcher Affiliation Academia Leo L Duan EMAIL Department of Statistics University of Florida, USA Anirban Bhattacharya EMAIL Department of Statistics Texas A&M University, USA
Pseudocode Yes In each iteration, with probability (1 w), the sampler will update θ using K(θt, ), the baseline algorithm; with probability w, the sampler will use Q(θt, ) to take a graph jump consisting of the following steps: 1. (Project to a node) Find the projection of θt to one βj, j = N(θt) := arg minl βl θt . 2. (Walk on the graph) Draw a new node i uniformly from the ball B(j; r). 3. (Relaxation from βi) Draw a proposal θ from a relaxation distribution F(θ | βi, θt). 4. (Metropolis Hastings adjustment) Accept θ as θt+1 with probability α(θt, θ ) = min 1, Π(θ | y)|B(i; r)| 1F(θt | βj, θ ) Π(θt | y)|B(j; r)| 1F(θ | βi, θt) 1 N(θ ) = i ; (2) otherwise keep θt+1 as the same as θt.
Open Source Code Yes The software is hosted and maintained on github repository under the following link: https://github.com/leoduan/graph_acc_mcmc
Open Datasets No The paper describes a dataset: "We use the power outage count for a zip code area in the south of Florida collected during a 90-day time period in the 2009 hurricane season." However, it does not provide any concrete access information (link, DOI, repository, or citation) for this dataset to be publicly available.
Dataset Splits No The paper mentions simulated data sizes ("n {100, 500, 1000, 2000}") and describes the power outage count data, but does not provide specific train/test/validation dataset splits, methodology, or references to predefined splits for any of the data used in the experiments.
Hardware Specification Yes it takes about 2 seconds to find the minimum spanning tree for m = 1600 on a laptop with Apple Silicon processor.
Software Dependencies No The paper mentions software like "numpyro package (Phan et al., 2019)" and "JAX package (Bradbury et al., 2018)" and references "Abril-Pla et al. (2023) for its sampler implementation" (referring to PyMC), but it does not specify version numbers for these software dependencies within the text.
Experiment Setup Yes In each iteration, with probability (1 w), the sampler will update θ using K(θt, ), the baseline algorithm; with probability w, the sampler will use Q(θt, ) to take a graph jump For acceleration, we use a variational distribution β... and use the graph-accelerated algorithm in (1) with w = 0.3 and Gaussian for F. We run both the baseline and accelerated algorithms for 10,000 iterations Using random-walk Metropolis as the baseline algorithm, we tweak s to around 0.5 so that the Metropolis acceptance rate is around 0.234. We run the algorithm for 3000 iterations and use the last 2000 as a Markov chain sample. We run the accelerated algorithm for 2000 iterations with w = 0.5 and r = 3, and report the empirical acceptance rate as the number of accepted graph jump steps divided by 1000, as equal to 2000 * 0.5. In prior specification, we use h Inverse-Gamma(2, 1) for the bandwidth, r N(0, )(0, 1) half Gaussian for the inverse dispersion parameter, τ Inverse-Gamma(2, 1) for the scale. We run the baseline algorithm for 20,000 iterations, and treat the first 5,000 as burn-in. we construct the graph and run the accelerated algorithm for 20,000 iterations (with the first 5,000 as burn-in).