High-Dimensional Poisson Structural Equation Model Learning via $\ell_1$-Regularized Regression
Authors: Gunwoong Park, Sion Park
JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify through simulations that our algorithm is statistically consistent in the highdimensional p > n setting, and performs well compared to state-of-the-art ODS, GES, and MMHC algorithms. We also demonstrate through multivariate real count data that our MRS algorithm is well-suited to estimating DAG models for multivariate count data in comparison to other methods used for discrete data. Keywords: Bayesian Networks, Directed Acyclic Graph, Identifiability, Structure Learning, ℓ1-Regularization, Multivariate Count Distribution |
| Researcher Affiliation | Academia | Gunwoong Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea Sion Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea |
| Pseudocode | Yes | Algorithm 1: Moments Ratio Scoring (MRS) Input : n i.i.d. samples, X1:n Output: Estimated ordering bπ = (bπ1, ..., bπp) and an edge structure, b E V V Set bπ0 = ; for m = {1, 2, , p} do Set S = {bπ1, , bπm 1}; for j {1, 2, , p} o S do Estimate bθS(j) for ℓ1-regularized generalized linear model (9); Calculate scores b S(m, j) using Equation (8); end The mth element of the ordering, bπm = arg minj b S(m, j); The parents of the mth element of the ordering, c Pa(bπm) = {k S | bθS bπmk = 0}; end Return: Estimate the edge set, b E = m V {(k, bπm) | k c Pa(bπm)} |
| Open Source Code | No | The paper does not contain any explicit statements about providing open-source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | Our original data set consists of 800 MLB player salary and batting statistics from the 2003 season (see R package Lahman in Friendly, 2017 for detailed information). |
| Dataset Splits | Yes | The MRS and ODS algorithms were implemented using ℓ1-regularized likelihood where we used five-fold cross validation to choose the regularization parameters. Where mean squared error was within two standard error of the minimum mean squared error, we chose the minimum value for the moments ratio scores and the largest value for parent selection. |
| Hardware Specification | No | The paper mentions running simulations and analyses but does not specify any particular hardware (e.g., CPU, GPU models, memory) used for these experiments. It only mentions 'R program' in the context of regenerating parameters and samples, not hardware. |
| Software Dependencies | No | The paper mentions using 'R package Lahman in Friendly, 2017' and 'R package XMRF (Wan et al., 2016)' but does not provide specific version numbers for R or these packages. |
| Experiment Setup | Yes | We generated the 200 samples with the same procedure specified in Section 4.1, but with the indegree constraint d = 2, and except that identity link function gj(η) = η and the range of parameters was θjk [ 1.5, 0.5] [0.5, 1.5]. We note that the link function must be positive, but we allow the negative value of θjk by randomly choosing θj [1, 10]. If any Poisson rate parameter is negative, we regenerated the parameters. |