High-Dimensional Poisson Structural Equation Model Learning via $\ell_1$-Regularized Regression

Authors: Gunwoong Park, Sion Park

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify through simulations that our algorithm is statistically consistent in the highdimensional p > n setting, and performs well compared to state-of-the-art ODS, GES, and MMHC algorithms. We also demonstrate through multivariate real count data that our MRS algorithm is well-suited to estimating DAG models for multivariate count data in comparison to other methods used for discrete data. Keywords: Bayesian Networks, Directed Acyclic Graph, Identifiability, Structure Learning, ℓ1-Regularization, Multivariate Count Distribution
Researcher Affiliation Academia Gunwoong Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea Sion Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea
Pseudocode Yes Algorithm 1: Moments Ratio Scoring (MRS) Input : n i.i.d. samples, X1:n Output: Estimated ordering bπ = (bπ1, ..., bπp) and an edge structure, b E V V Set bπ0 = ; for m = {1, 2, , p} do Set S = {bπ1, , bπm 1}; for j {1, 2, , p} o S do Estimate bθS(j) for ℓ1-regularized generalized linear model (9); Calculate scores b S(m, j) using Equation (8); end The mth element of the ordering, bπm = arg minj b S(m, j); The parents of the mth element of the ordering, c Pa(bπm) = {k S | bθS bπmk = 0}; end Return: Estimate the edge set, b E = m V {(k, bπm) | k c Pa(bπm)}
Open Source Code No The paper does not contain any explicit statements about providing open-source code or links to a code repository for the methodology described.
Open Datasets Yes Our original data set consists of 800 MLB player salary and batting statistics from the 2003 season (see R package Lahman in Friendly, 2017 for detailed information).
Dataset Splits Yes The MRS and ODS algorithms were implemented using ℓ1-regularized likelihood where we used five-fold cross validation to choose the regularization parameters. Where mean squared error was within two standard error of the minimum mean squared error, we chose the minimum value for the moments ratio scores and the largest value for parent selection.
Hardware Specification No The paper mentions running simulations and analyses but does not specify any particular hardware (e.g., CPU, GPU models, memory) used for these experiments. It only mentions 'R program' in the context of regenerating parameters and samples, not hardware.
Software Dependencies No The paper mentions using 'R package Lahman in Friendly, 2017' and 'R package XMRF (Wan et al., 2016)' but does not provide specific version numbers for R or these packages.
Experiment Setup Yes We generated the 200 samples with the same procedure specified in Section 4.1, but with the indegree constraint d = 2, and except that identity link function gj(η) = η and the range of parameters was θjk [ 1.5, 0.5] [0.5, 1.5]. We note that the link function must be positive, but we allow the negative value of θjk by randomly choosing θj [1, 10]. If any Poisson rate parameter is negative, we regenerated the parameters.