reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Graph-accelerated Markov Chain Monte Carlo using Approximate Samples

Authors: Leo L. Duan, Anirban Bhattacharya

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate improved mixing performances for challenging problems, such as those involving multiple modes, non-convex density contour, or large-dimension latent variables. Section 4. Simulations Section 5. Application: Estimating Latent Gaussian Model for Power Outage Data
Researcher Affiliation	Academia	Leo L Duan EMAIL Department of Statistics University of Florida, USA Anirban Bhattacharya EMAIL Department of Statistics Texas A&M University, USA
Pseudocode	Yes	In each iteration, with probability (1 w), the sampler will update θ using K(θt, ), the baseline algorithm; with probability w, the sampler will use Q(θt, ) to take a graph jump consisting of the following steps: 1. (Project to a node) Find the projection of θt to one βj, j = N(θt) := arg minl βl θt . 2. (Walk on the graph) Draw a new node i uniformly from the ball B(j; r). 3. (Relaxation from βi) Draw a proposal θ from a relaxation distribution F(θ \| βi, θt). 4. (Metropolis Hastings adjustment) Accept θ as θt+1 with probability α(θt, θ ) = min 1, Π(θ \| y)\|B(i; r)\| 1F(θt \| βj, θ ) Π(θt \| y)\|B(j; r)\| 1F(θ \| βi, θt) 1 N(θ ) = i ; (2) otherwise keep θt+1 as the same as θt.
Open Source Code	Yes	The software is hosted and maintained on github repository under the following link: https://github.com/leoduan/graph_acc_mcmc
Open Datasets	No	The paper describes a dataset: "We use the power outage count for a zip code area in the south of Florida collected during a 90-day time period in the 2009 hurricane season." However, it does not provide any concrete access information (link, DOI, repository, or citation) for this dataset to be publicly available.
Dataset Splits	No	The paper mentions simulated data sizes ("n {100, 500, 1000, 2000}") and describes the power outage count data, but does not provide specific train/test/validation dataset splits, methodology, or references to predefined splits for any of the data used in the experiments.
Hardware Specification	Yes	it takes about 2 seconds to find the minimum spanning tree for m = 1600 on a laptop with Apple Silicon processor.
Software Dependencies	No	The paper mentions software like "numpyro package (Phan et al., 2019)" and "JAX package (Bradbury et al., 2018)" and references "Abril-Pla et al. (2023) for its sampler implementation" (referring to PyMC), but it does not specify version numbers for these software dependencies within the text.
Experiment Setup	Yes	In each iteration, with probability (1 w), the sampler will update θ using K(θt, ), the baseline algorithm; with probability w, the sampler will use Q(θt, ) to take a graph jump For acceleration, we use a variational distribution β... and use the graph-accelerated algorithm in (1) with w = 0.3 and Gaussian for F. We run both the baseline and accelerated algorithms for 10,000 iterations Using random-walk Metropolis as the baseline algorithm, we tweak s to around 0.5 so that the Metropolis acceptance rate is around 0.234. We run the algorithm for 3000 iterations and use the last 2000 as a Markov chain sample. We run the accelerated algorithm for 2000 iterations with w = 0.5 and r = 3, and report the empirical acceptance rate as the number of accepted graph jump steps divided by 1000, as equal to 2000 * 0.5. In prior specification, we use h Inverse-Gamma(2, 1) for the bandwidth, r N(0, )(0, 1) half Gaussian for the inverse dispersion parameter, τ Inverse-Gamma(2, 1) for the scale. We run the baseline algorithm for 20,000 iterations, and treat the first 5,000 as burn-in. we construct the graph and run the accelerated algorithm for 20,000 iterations (with the first 5,000 as burn-in).