Learning a High-dimensional Linear Structural Equation Model via l1-Regularized Regression
Authors: Gunwoong Park, Sang Jun Moon, Sion Park, Jong-June Jeon
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through simulations, we verify that the proposed algorithm is statistically consistent and computationally feasible, and it performs well compared to the state-of-the-art US, GDS, LISTEN and TD algorithms with our settings. We also demonstrate through real COVID-19 data that the proposed algorithm is well-suited to estimating a virus-spread map in China. Sections 4 and 5 evaluate our method and state-of-the-art algorithms using synthetic and real COVID-19 data. |
| Researcher Affiliation | Academia | All authors are affiliated with the Department of Statistics, University of Seoul. |
| Pseudocode | Yes | Algorithm 1: High-dimensional Linear SEM Learning Algorithm Input : n i.i.d. samples, X1:n Output: Estimated graph structure, b G = (V, b E) Set bπp+1 = ; for r = {1, 2, ..., p 1} do for j V \ {bπp+1, ..., bπp+2 r} do Sj(r) = V \ ({j} {bπp+1, ..., bπp+2 r}) ; Estimate bθj(r) for ℓ1-regularized regression in Equation (5); Estimate conditional variances d Var(Xj | XSj(r)) using Equation (6); end Determine the (p + 1 r)-th element of the ordering: bπp+1 r = arg maxj d Var(Xj | XSj(r)); Determine the parents of bπp+1 r: c Pa(bπp+1 r) = {k Sj(r) : [bθbπp+1 r(r)]k = 0}; end Return: Estimate an edge set, b E = r {1,2,...,p 1}{(k, bπp+1 r) : k c Pa(bπp+1 r)} |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | The data were collected from the Coronavirus Resource Center of Johns Hopkins University & Medicine (https://coronavirus.jhu.edu/map.html). |
| Dataset Splits | No | The paper mentions collecting data from the Coronavirus Resource Center of Johns Hopkins University & Medicine and specifies the number of records and covariates (n=52 samples, p=31 covariates) and excluded regions. However, it does not provide explicit training, validation, or test dataset splits or cross-validation details for experimental reproduction. |
| Hardware Specification | No | The paper discusses computational complexity and average run times in Section 4.3 but does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used for running the experiments or simulations. |
| Software Dependencies | No | The paper mentions using ℓ1-regularized regression and compares algorithms like US, GDS, LISTEN, and TD. While it cites Friedman et al. (2009) for 'glmnet: Lasso and elastic-net regularized generalized linear models' as an R package, it does not explicitly state the specific software dependencies with version numbers used for its own implementation or experiments. |
| Experiment Setup | Yes | In terms of the regularization parameters for the proposed and LISTEN algorithms, they were set to 2 q n . In addition for the LISTEN algorithm, the hard threshold parameter was set to half of the minimum value of true edge weights, min(|β jk|/2), by using the true model information. Lastly, for the TD algorithm, we always set predetermined parameter q to the true maximum indegree of a graph. For the US algorithm, Fisher s independence test was exploited with the significance level α = 1 Φ(0.5n1/3) where Φ( ) is the cumulative distribution function of the standard normal distribution. For the GDS algorithm, we set the initial graph to a random graph. |