Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Low Tree-Rank Bayesian Vector Autoregression Models

Authors: Leo L Duan, Zeyu Yuwen, George Michailidis, Zhengwu Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Sections 5 and 6 illustrate the performance of tree-rank estimates on synthetic and resting-state functional magnetic resonance imaging data.
Researcher Affiliation Academia Leo L Duan EMAIL Department of Statistics, University of Florida Zeyu Yuwen EMAIL Department of Statistics, University of Florida George Michailidis EMAIL Department of Statistics, University of California at Los Angeles Zhengwu Zhang EMAIL Department of Statistics and Operations Research, University of North Carolina at Chapel Hill
Pseudocode Yes Algorithm 1: Find an upper bound estimate m Tree-Rank( G).
Open Source Code Yes The software is available at https://github.com/leoduan/Spanning-Tree-VAR.
Open Datasets Yes We employ the proposed model to analyze resting-state functional magnetic resonance imaging (f MRI) data from the Human Connectome Project.
Dataset Splits Yes We denote the first scan as the test batch and the second scan as the retest batch. As our study focuses on reproducibility, we use the test batch as the training data for graph estimation and the retest batch as validation data to assess how many edge estimates can be reproduced.
Hardware Specification Yes It takes about 4 minutes to run the MCMC algorithm for 1000 iterations at p = 30, and 10 minutes at p = 80 on a quad-core laptop.
Software Dependencies No The paper mentions software components like 'igraph function smallworld' but does not provide specific version numbers for any key software used in their implementation or experiments.
Experiment Setup Yes Next, we specify the hyper-parameters mentioned above. First, we standardize each vector (y1 j , . . . , y T j ), so that it has sample mean 0 and sample variance 1. This allows us to set the noise variance roughly on the same scale, σ2 ε Gamma 1(2, 1) and γW = 1. Next, for the generalized double Pareto distribution, we follow Armagan et al. (2013) and use αη = 3 and γη = 0.001 to balance between sparsity and tail-robustness. To regularize the order of autoregression, we use ak = 3 and bk = 2 0.1k, corresponding to increasingly smaller prior mean Erk = 0.1k and variance Vrk = 0.12k as k increases. For the union of trees prior distribution, we empirically find that having λ adaptive to the length of the time series T is effective to control the number of edges |E T |, and we use λ = 0.1T in this article. For the parameter dimensions, we use d = m = 10. [...] We set the hyper-parameters according to the discussion in Section 2, with (m, d) = (10, 10). For comparison purposes, we also fit sparse VAR models using (i) shrinkage only, (ii) trees only, (iii) lasso regularization, and (iv) elastic net regularization. It takes about 10 minutes to run the MCMC algorithm for each Bayesian model, and about 2 minutes to run the optimization algorithm for lasso or elastic net regularization on a quad-core laptop. We form a point estimate ˆG using the posterior mean (or the optimal value) ˆC, then threshold it using the procedure described in Section 2.2. For the Bayesian models, we use the posterior mean of η in this step.