reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Causal Discovery with Generalized Linear Models through Peeling Algorithms

Authors: Minjie Wang, Xiaotong Shen, Wei Pan

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, the article presents numerical experiments showcasing the eﬀectiveness of our approach in comparison to state-of-the-art structure learning methods without confounders. Lastly, it demonstrates an application to Alzheimer s disease (AD), highlighting the method s utility in constructing gene-to-gene and gene-to-disease regulatory networks involving Single Nucleotide Polymorphisms (SNPs) for healthy and AD subjects.
Researcher Affiliation	Academia	Minjie Wang EMAIL Department of Mathematics and Statistics Binghamton University, State University of New York Binghamton, NY 13902, USA; Xiaotong Shen EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455, USA; Wei Pan EMAIL Division of Biostatistics University of Minnesota Minneapolis, MN 55455, USA
Pseudocode	Yes	Algorithm 1 summarizes the DC algorithm for solving nonconvex minimization (5). Algorithm 2 summarizes the peeling process for identifying all ancestral relationships or the causal order among primary variables. Algorithm 3 summarizes the peeling process for identifying parent-child relationships using the proposed deconfounders. Algorithm 4 serves as a general version of Algorithm 3 in the main paper for estimating parent-child relationships in the presence of confounders. Algorithm 5: Peeling algorithm for estimating parent-child relationships in the presence of confounders using GLMM and DRI. Algorithm 6: Peeling algorithm for estimating parent-child relationships via DPS.
Open Source Code	Yes	The R implementation is available at https://github.com/minjie-wang/GAMPI.
Open Datasets	Yes	This section applies GAMPI to a publicly available Alzheimer s Disease Neuroimaging Initiative (ADNI) dataset. Our goal is to estimate a regulatory gene expression network of a subset of genes related to Alzheimer s disease (AD) and identify which of the genes have a direct causal eﬀect on AD through gene-to-gene and gene-to-AD regulatory networks. First, we download the raw data from the ANDI website (https://adni.loni.usc.edu)... In addition, from the KEGG database (Kanehisa et al. 2002), we extract the AD reference pathway (hsa05010, https://www.genome.jp/pathway/hsa05010)
Dataset Splits	Yes	In practice, we use either 5-fold cross-validation or the extended Bayesian information criterion (EBIC) to choose (τj, Kj). ... We treat the 247 CN individuals as the control group and the remaining 465 AD and MCI individuals as the case group.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) are provided for running the experiments.
Software Dependencies	No	The R implementation is available at https://github.com/minjie-wang/GAMPI. However, no specific version numbers for R or any R packages are provided.
Experiment Setup	Yes	For NOTEARS and DAGMA, we use the loss type that is appropriate for the data type of the outcome variables. ... with GAMPI employing EBIC for tuning parameter selection and NOTEARS applying the default value of 0.1. ... In practice, we use either 5-fold cross-validation or the extended Bayesian information criterion (EBIC) to choose (τj, Kj). We recommend EBIC due to its computational eﬃciency and strong empirical performance.