Model-based Causal Discovery for Zero-Inflated Count Data

Authors: Junsouk Choi, Yang Ni

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive synthetic experiments as well as a real dataset with known ground truth demonstrate the superior performance of the proposed method against state-of-the-art alternative methods in discovering causal structure from observational zero-inflated count data. An application of reverse-engineering a gene regulatory network from a single-cell RNA-sequencing dataset illustrates the utility of Zi G-DAGs in practice.
Researcher Affiliation Academia Junsouk Choi EMAIL Department of Statistics Texas A&M University College Station, TX 98195-4322, USA. Yang Ni EMAIL Department of Statistics Texas A&M University College Station, TX 94720-1776, USA.
Pseudocode Yes Algorithm 1 Hill climbing. Algorithm 2 Tabu search.
Open Source Code Yes The R implementation of the proposed method is available in the R package Zi GDAG (https://github.com/junsoukchoi/Zi GDAG.git).
Open Datasets Yes We illustrate the utility of the proposed Zi G-DAG by performing two analyses of a sc RNAseq dataset (Li et al., 2017) that consists of 561 cells from 11 primary colorectal cancer (CRC) tumors and matched normal mucosa. First, from the TRRUST database (Han et al., 2018), we extract a list of literature-curated pairs of transcription factor and its target.
Dataset Splits No The paper mentions data generation parameters and sample sizes for synthetic data, e.g., "We sample data from the linear Zi G-DAG with different sample sizes n {250, 500, 1000, 2000}". For real data, it mentions filtering cells and retaining 472 cells (Section 6.2). However, it does not specify any training/testing/validation splits for reproducibility of experimental evaluation.
Hardware Specification No The paper does not provide specific hardware details such as CPU models, GPU models, or memory specifications used for running the experiments.
Software Dependencies No The R implementation of the proposed method is available in the R package Zi GDAG (https://github.com/junsoukchoi/Zi GDAG.git). In our experiments, MRS utilizes the R package MXM to estimate the skeleton of DAG. We filter cell doublets and multiplets using an R package for single cell genomics, Seurat (Hao et al., 2021). The paper mentions software packages like 'R package Zi GDAG', 'R package MXM', and 'Seurat' but does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup Yes For each simulation setting, we set the causal DAG G by randomly generating a sparse DAG with d edges. Given the DAG, we generate coefficients (αjk, βjk) in (4) from independent uniform distributions: αjk U(0.5, 2) and βjk U( 2, 0.5) for k pa G(j) and j V . The intercepts δj and γj in (4) are chosen uniformly at random from ( 1.5, 1) and (1, 1.5), respectively. The additional parameters ψj for the GHPD (hyper-Poisson distribution) are sampled as log(ψj) U( 2, 2). For learning the nonlinear Zi G-DAG, we use Mf = Mg = 4 spline basis with a knot being placed at the 50% quantile of the data.