reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Spanning Tree: Estimating the Backbone of the Dependence Graph

Authors: Leo L. Duan, David B. Dunson

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In both theory and experiments, we show that this model does not require the population graph to be a spanning tree or the covariance to satisfy assumptions beyond positive-deﬁniteness. The model accurately recovers the backbone of the population graph at a rate competitive with existing approaches but with better robustness. We show combinatorial properties of the spanning tree, which may be of independent interest, and develop an eﬃcient Gibbs sampler for Bayesian inference. Analyzing electroencephalography data using a hidden Markov model with each latent state modeled by a spanning tree, we show that results are much more interpretable compared with popular alternatives. Keywords: Graph-constrained Model, Incidence Matrix, Laplacian, Matrix Tree Theorem, Traveling Salesperson Problem.
Researcher Affiliation	Academia	Leo L. Duan EMAIL Department of Statistics, University of Florida David B. Dunson EMAIL Department of Statistical Science, Duke University
Pseudocode	Yes	Algorithm 1: Finding the conditional posterior mode Â Initialize U = {1}, ET = and U = {1, . . . , p} \ U; while \|U\| < p do Find (j, k) = arg max (l,l ) (U, U) ql,l , and (j, k) into ET . Move l from U to U. end
Open Source Code	No	No explicit statement or link to their own implementation code is provided. The paper mentions using 'scikit-learn package' for data generation and examples, but not for releasing their own method's code.
Open Datasets	Yes	Next, we move to the case of a graph formed near two manifolds. We use the two-moon example provided in the scikit-learn package. As shown in Figure 8, each point has only one or a few points in the neighborhood, therefore, the posterior distribution of T is highly concentrated near the posterior mode.
Dataset Splits	No	The paper mentions, 'To validate our trained model, we use a previously reserved set of EEG data collected from another 20 subjects (hence 40 time series),' which indicates a validation set, but it does not specify exact split percentages, sample counts, or a detailed splitting methodology for reproducibility.
Hardware Specification	Yes	In terms of the computational time, for a graph containing p = 200 nodes, sampling a 1, 000 steps takes about 2 minutes using Python on a quad-core laptop.
Software Dependencies	No	The paper mentions using 'Python' and the 'scikit-learn package' but does not provide specific version numbers for either, which is required for reproducible software dependencies.
Experiment Setup	Yes	We choose α = 5 as a balanced choice between shrinkage and tail-robustness; details can found in Armagan et al. (2013). We use an informative exponential prior τ Exp(1/µτ) with the prior mean set to µτ = min(j,k:j =k) yj yk 2/n, as an empirical estimate of the smallest scale among vectors ( yj yk) s. For the first step, we use an adaptation period at the beginning of the MCMC algorithm to tune δ > 0, so that the acceptance rate of this step is around 0.3. We set K = 20 for as an upper bound. We run our MCMC algorithm for 20, 000 iterations, and discard the initial 10, 000 as a burn-in. We average the last 5, 000 iterations to compute the likelihood...