reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Generic Family of Graphical Models: Diversity, Efficiency, and Heterogeneity

Authors: Yufei Huang, Changhu Wang, Junjie Tang, Weichi Wu, Ruibin Xi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the validity of the desirable properties of the models and the effective estimation methods, and demonstrate their advantages over the state-of-the-art network inference methods via extensive simulation studies and a gene regulatory network analysis of real singlecell RNA sequencing data. To evaluate the performance of our method, we conduct simulations on mixed count data and binary data.
Researcher Affiliation	Academia	1Center for Data Science, Peking University, Beijing, China 2School of Mathematical Sciences, Peking University, Beijing, China 3Center for Statistical Science, Peking University, Beijing, China 4Department of Statistics and Data Science, Tsinghua University, Beijing, China. Correspondence to: Ruibin Xi <EMAIL>, Weichi Wu <EMAIL>.
Pseudocode	Yes	The framework of EM-MMLE is summarized in Algorithm 1.
Open Source Code	Yes	All code is available at https://github.com/Xi Ds Lab/EMMMLE.
Open Datasets	Yes	In this section, we evaluate the performance of EM-MMLE for gene regulatory network inference using a real sc RNA-seq dataset (Zheng et al., 2017), comprising 6,952 cells across four cell types. The databases utilized in this study include STRING (Szklarczyk et al., 2019), Human TFDB (Hu et al., 2019), h TFtarget (Zhang et al., 2020), Ch EA (Lachmann et al., 2010), Ch IP-Atlas (Oki et al., 2018), Ch IPBase (Zhou et al., 2016), ESCAPE (Xu et al., 2013), TRRUST (Han et al., 2018), and Reg Network (Liu et al., 2015).
Dataset Splits	Yes	The dataset includes two batches, sequenced by 3 and 5 sc RNA-seq technologies. One batch is used to construct silver standards based on public regulatory network databases (Appendix C.1), while the other batch is reserved for algorithm testing.
Hardware Specification	No	Part of the analysis was performed on the highperformance computing platform of the Center for Life Sciences (Peking University).
Software Dependencies	No	The paper references various software and packages like 'Seurat (Stuart et al., 2019)', 'CVXR (Fu et al., 2020)', 'PPCOR (Kim, 2015)', and 'GENIE3 (Huynh-Thu et al., 2010)'. However, it does not provide specific version numbers for key software components or libraries used for the implementation of the paper's methodology itself, which are necessary for exact reproducibility.
Experiment Setup	No	The paper provides detailed parameters for generating synthetic data in its simulation studies (e.g., 'The number of populations G is set as 3, and the proportion parameter π is set as (1/3, 1/3, 1/3).', 'We set p = 50, 200 and evaluate the performance for three sample sizes: n = 200, 500, 3000.', 'we set α1 = 0.15'). It also mentions the use of AIC for tuning parameter selection. However, the main text does not contain specific hyperparameters (like learning rate, batch size, number of epochs, or optimizer settings) for training the EM-MMLE model or other methods compared.