Causal Discovery with Generalized Linear Models through Peeling Algorithms

Authors: Minjie Wang, Xiaotong Shen, Wei Pan

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, the article presents numerical experiments showcasing the effectiveness of our approach in comparison to state-of-the-art structure learning methods without confounders. Lastly, it demonstrates an application to Alzheimer s disease (AD), highlighting the method s utility in constructing gene-to-gene and gene-to-disease regulatory networks involving Single Nucleotide Polymorphisms (SNPs) for healthy and AD subjects.
Researcher Affiliation Academia Minjie Wang EMAIL Department of Mathematics and Statistics Binghamton University, State University of New York Binghamton, NY 13902, USA; Xiaotong Shen EMAIL School of Statistics University of Minnesota Minneapolis, MN 55455, USA; Wei Pan EMAIL Division of Biostatistics University of Minnesota Minneapolis, MN 55455, USA
Pseudocode Yes Algorithm 1 summarizes the DC algorithm for solving nonconvex minimization (5). Algorithm 2 summarizes the peeling process for identifying all ancestral relationships or the causal order among primary variables. Algorithm 3 summarizes the peeling process for identifying parent-child relationships using the proposed deconfounders. Algorithm 4 serves as a general version of Algorithm 3 in the main paper for estimating parent-child relationships in the presence of confounders. Algorithm 5: Peeling algorithm for estimating parent-child relationships in the presence of confounders using GLMM and DRI. Algorithm 6: Peeling algorithm for estimating parent-child relationships via DPS.
Open Source Code Yes The R implementation is available at https://github.com/minjie-wang/GAMPI.
Open Datasets Yes This section applies GAMPI to a publicly available Alzheimer s Disease Neuroimaging Initiative (ADNI) dataset. Our goal is to estimate a regulatory gene expression network of a subset of genes related to Alzheimer s disease (AD) and identify which of the genes have a direct causal effect on AD through gene-to-gene and gene-to-AD regulatory networks. First, we download the raw data from the ANDI website (https://adni.loni.usc.edu)... In addition, from the KEGG database (Kanehisa et al. 2002), we extract the AD reference pathway (hsa05010, https://www.genome.jp/pathway/hsa05010)
Dataset Splits Yes In practice, we use either 5-fold cross-validation or the extended Bayesian information criterion (EBIC) to choose (τj, Kj). ... We treat the 247 CN individuals as the control group and the remaining 465 AD and MCI individuals as the case group.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) are provided for running the experiments.
Software Dependencies No The R implementation is available at https://github.com/minjie-wang/GAMPI. However, no specific version numbers for R or any R packages are provided.
Experiment Setup Yes For NOTEARS and DAGMA, we use the loss type that is appropriate for the data type of the outcome variables. ... with GAMPI employing EBIC for tuning parameter selection and NOTEARS applying the default value of 0.1. ... In practice, we use either 5-fold cross-validation or the extended Bayesian information criterion (EBIC) to choose (τj, Kj). We recommend EBIC due to its computational efficiency and strong empirical performance.