reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

High-dimensional Gaussian graphical models on network-linked data

Authors: Tianxi Li, Cheng Qian, Elizaveta Levina, Ji Zhu

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose an efficient estimation algorithm and demonstrate its effectiveness on both simulated and real data, obtaining meaningful and interpretable results on a statistics coauthorship network.
Researcher Affiliation	Academia	Tianxi Li EMAIL Department of Statistics University of Virginia Charlottesville, VA 22904, USA Cheng Qian EMAIL School of Mathematics Southeast University Nanjing, Jiangsu 211189, China Elizaveta Levina EMAIL Ji Zhu EMAIL Department of Statistics University of Michigan Ann Arbor, MI 48109, USA
Pseudocode	Yes	Algorithm 1 (Two-stage GNC-lasso algorithm) Input: a standardized data matrix X, network adjacency matrix A, tuning parameters λ and α. 1. Mean estimation. Let Ls be the standardized Laplacian of A. Estimated the mean matrix by ˆ M = arg min M X M 2 F + α tr(MT Ls M). (4) 2. Covariance estimation. Let ˆS = 1 n(X ˆ M)T (X ˆ M) be the sample covariance matrix of X based on ˆ M. Estimate the precision matrix by ˆΘ = arg min Θ Sn + log det(Θ) tr(Θ ˆS) λ Θ 1,oﬀ. (5)
Open Source Code	No	The paper does not provide any explicit statements about code availability, nor does it include links to source code repositories.
Open Datasets	Yes	Here we apply the proposed method to the dataset of papers from 2003-2012 from four statistical journals collected by Ji and Jin (2016).
Dataset Splits	No	There is no explicit mention of specific training, validation, or test dataset splits. The paper mentions "10-fold cross-validation" for tuning parameters but not for defining fixed dataset splits for evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions algorithms like "glasso" and methods like "K-means clustering" and "gap method" but does not specify any software dependencies with version numbers (e.g., programming language versions, library versions).
Experiment Setup	Yes	Noise settings: The conditional dependence graph G in the Gaussian graphical model is generated as an Erd os-Renyi graph on p nodes, with each node pair connecting independently with probability 0.01. The Gaussian noise is then drawn from N(0, Σ) where Θ = Σ 1 = a(0.3AG +(0.3e G +0.1)I), where AG is the adjacency matrix of G,e G is the absolute value of the smallest eigenvalue of AG and the scalar a is set to ensure the resulting Σ has all diagonal elements equal to 1. This procedure is implemented in Zhao et al. (2012). Mean settings: We set up the mean to allow for varying degrees of cohesion. each row M j, j = 1, 2, , p as M ,j = where u(j) is randomly sampled with replacement from the eigenvectors of the Laplacian un 1, nn 2, , un k for some integer k and t is the mixing proportion. We then rescale M so the signal-to-noise ratio becomes 1.6... and There are two tuning parameters, λ and α, in the two-stage GNC-lasso algorithm. The parameter α controls the amount of cohesion over the network in the estimated mean and can be easily tuned based on its predictive performance. In subsequent numerical examples, we always choose α from a sequence of candidate values by 10-fold cross-validation. and To keep the graphs comparable and to allow for more interpretable results, we instead set the number of edges to 25 for both methods, and compare resulting graphs