reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

High-Dimensional Poisson Structural Equation Model Learning via $\ell_1$-Regularized Regression

Authors: Gunwoong Park, Sion Park

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify through simulations that our algorithm is statistically consistent in the highdimensional p > n setting, and performs well compared to state-of-the-art ODS, GES, and MMHC algorithms. We also demonstrate through multivariate real count data that our MRS algorithm is well-suited to estimating DAG models for multivariate count data in comparison to other methods used for discrete data. Keywords: Bayesian Networks, Directed Acyclic Graph, Identiﬁability, Structure Learning, ℓ1-Regularization, Multivariate Count Distribution
Researcher Affiliation	Academia	Gunwoong Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea Sion Park EMAIL Department of Statistics University of Seoul Seoul, 02504, South Korea
Pseudocode	Yes	Algorithm 1: Moments Ratio Scoring (MRS) Input : n i.i.d. samples, X1:n Output: Estimated ordering bπ = (bπ1, ..., bπp) and an edge structure, b E V V Set bπ0 = ; for m = {1, 2, , p} do Set S = {bπ1, , bπm 1}; for j {1, 2, , p} o S do Estimate bθS(j) for ℓ1-regularized generalized linear model (9); Calculate scores b S(m, j) using Equation (8); end The mth element of the ordering, bπm = arg minj b S(m, j); The parents of the mth element of the ordering, c Pa(bπm) = {k S \| bθS bπmk = 0}; end Return: Estimate the edge set, b E = m V {(k, bπm) \| k c Pa(bπm)}
Open Source Code	No	The paper does not contain any explicit statements about providing open-source code or links to a code repository for the methodology described.
Open Datasets	Yes	Our original data set consists of 800 MLB player salary and batting statistics from the 2003 season (see R package Lahman in Friendly, 2017 for detailed information).
Dataset Splits	Yes	The MRS and ODS algorithms were implemented using ℓ1-regularized likelihood where we used ﬁve-fold cross validation to choose the regularization parameters. Where mean squared error was within two standard error of the minimum mean squared error, we chose the minimum value for the moments ratio scores and the largest value for parent selection.
Hardware Specification	No	The paper mentions running simulations and analyses but does not specify any particular hardware (e.g., CPU, GPU models, memory) used for these experiments. It only mentions 'R program' in the context of regenerating parameters and samples, not hardware.
Software Dependencies	No	The paper mentions using 'R package Lahman in Friendly, 2017' and 'R package XMRF (Wan et al., 2016)' but does not provide specific version numbers for R or these packages.
Experiment Setup	Yes	We generated the 200 samples with the same procedure speciﬁed in Section 4.1, but with the indegree constraint d = 2, and except that identity link function gj(η) = η and the range of parameters was θjk [ 1.5, 0.5] [0.5, 1.5]. We note that the link function must be positive, but we allow the negative value of θjk by randomly choosing θj [1, 10]. If any Poisson rate parameter is negative, we regenerated the parameters.