reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Model-based Causal Discovery for Zero-Inflated Count Data

Authors: Junsouk Choi, Yang Ni

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive synthetic experiments as well as a real dataset with known ground truth demonstrate the superior performance of the proposed method against state-of-the-art alternative methods in discovering causal structure from observational zero-inﬂated count data. An application of reverse-engineering a gene regulatory network from a single-cell RNA-sequencing dataset illustrates the utility of Zi G-DAGs in practice.
Researcher Affiliation	Academia	Junsouk Choi EMAIL Department of Statistics Texas A&M University College Station, TX 98195-4322, USA. Yang Ni EMAIL Department of Statistics Texas A&M University College Station, TX 94720-1776, USA.
Pseudocode	Yes	Algorithm 1 Hill climbing. Algorithm 2 Tabu search.
Open Source Code	Yes	The R implementation of the proposed method is available in the R package Zi GDAG (https://github.com/junsoukchoi/Zi GDAG.git).
Open Datasets	Yes	We illustrate the utility of the proposed Zi G-DAG by performing two analyses of a sc RNAseq dataset (Li et al., 2017) that consists of 561 cells from 11 primary colorectal cancer (CRC) tumors and matched normal mucosa. First, from the TRRUST database (Han et al., 2018), we extract a list of literature-curated pairs of transcription factor and its target.
Dataset Splits	No	The paper mentions data generation parameters and sample sizes for synthetic data, e.g., "We sample data from the linear Zi G-DAG with different sample sizes n {250, 500, 1000, 2000}". For real data, it mentions filtering cells and retaining 472 cells (Section 6.2). However, it does not specify any training/testing/validation splits for reproducibility of experimental evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU models, GPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The R implementation of the proposed method is available in the R package Zi GDAG (https://github.com/junsoukchoi/Zi GDAG.git). In our experiments, MRS utilizes the R package MXM to estimate the skeleton of DAG. We ﬁlter cell doublets and multiplets using an R package for single cell genomics, Seurat (Hao et al., 2021). The paper mentions software packages like 'R package Zi GDAG', 'R package MXM', and 'Seurat' but does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	For each simulation setting, we set the causal DAG G by randomly generating a sparse DAG with d edges. Given the DAG, we generate coeﬃcients (αjk, βjk) in (4) from independent uniform distributions: αjk U(0.5, 2) and βjk U( 2, 0.5) for k pa G(j) and j V . The intercepts δj and γj in (4) are chosen uniformly at random from ( 1.5, 1) and (1, 1.5), respectively. The additional parameters ψj for the GHPD (hyper-Poisson distribution) are sampled as log(ψj) U( 2, 2). For learning the nonlinear Zi G-DAG, we use Mf = Mg = 4 spline basis with a knot being placed at the 50% quantile of the data.