reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning a High-dimensional Linear Structural Equation Model via l1-Regularized Regression

Authors: Gunwoong Park, Sang Jun Moon, Sion Park, Jong-June Jeon

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through simulations, we verify that the proposed algorithm is statistically consistent and computationally feasible, and it performs well compared to the state-of-the-art US, GDS, LISTEN and TD algorithms with our settings. We also demonstrate through real COVID-19 data that the proposed algorithm is well-suited to estimating a virus-spread map in China. Sections 4 and 5 evaluate our method and state-of-the-art algorithms using synthetic and real COVID-19 data.
Researcher Affiliation	Academia	All authors are affiliated with the Department of Statistics, University of Seoul.
Pseudocode	Yes	Algorithm 1: High-dimensional Linear SEM Learning Algorithm Input : n i.i.d. samples, X1:n Output: Estimated graph structure, b G = (V, b E) Set bπp+1 = ; for r = {1, 2, ..., p 1} do for j V \ {bπp+1, ..., bπp+2 r} do Sj(r) = V \ ({j} {bπp+1, ..., bπp+2 r}) ; Estimate bθj(r) for ℓ1-regularized regression in Equation (5); Estimate conditional variances d Var(Xj \| XSj(r)) using Equation (6); end Determine the (p + 1 r)-th element of the ordering: bπp+1 r = arg maxj d Var(Xj \| XSj(r)); Determine the parents of bπp+1 r: c Pa(bπp+1 r) = {k Sj(r) : [bθbπp+1 r(r)]k = 0}; end Return: Estimate an edge set, b E = r {1,2,...,p 1}{(k, bπp+1 r) : k c Pa(bπp+1 r)}
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available.
Open Datasets	Yes	The data were collected from the Coronavirus Resource Center of Johns Hopkins University & Medicine (https://coronavirus.jhu.edu/map.html).
Dataset Splits	No	The paper mentions collecting data from the Coronavirus Resource Center of Johns Hopkins University & Medicine and specifies the number of records and covariates (n=52 samples, p=31 covariates) and excluded regions. However, it does not provide explicit training, validation, or test dataset splits or cross-validation details for experimental reproduction.
Hardware Specification	No	The paper discusses computational complexity and average run times in Section 4.3 but does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations.
Software Dependencies	No	The paper mentions using ℓ1-regularized regression and compares algorithms like US, GDS, LISTEN, and TD. While it cites Friedman et al. (2009) for 'glmnet: Lasso and elastic-net regularized generalized linear models' as an R package, it does not explicitly state the specific software dependencies with version numbers used for its own implementation or experiments.
Experiment Setup	Yes	In terms of the regularization parameters for the proposed and LISTEN algorithms, they were set to 2 q n . In addition for the LISTEN algorithm, the hard threshold parameter was set to half of the minimum value of true edge weights, min(\|β jk\|/2), by using the true model information. Lastly, for the TD algorithm, we always set predetermined parameter q to the true maximum indegree of a graph. For the US algorithm, Fisher s independence test was exploited with the signiﬁcance level α = 1 Φ(0.5n1/3) where Φ( ) is the cumulative distribution function of the standard normal distribution. For the GDS algorithm, we set the initial graph to a random graph.