A Generic Family of Graphical Models: Diversity, Efficiency, and Heterogeneity
Authors: Yufei Huang, Changhu Wang, Junjie Tang, Weichi Wu, Ruibin Xi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the validity of the desirable properties of the models and the effective estimation methods, and demonstrate their advantages over the state-of-the-art network inference methods via extensive simulation studies and a gene regulatory network analysis of real singlecell RNA sequencing data. To evaluate the performance of our method, we conduct simulations on mixed count data and binary data. |
| Researcher Affiliation | Academia | 1Center for Data Science, Peking University, Beijing, China 2School of Mathematical Sciences, Peking University, Beijing, China 3Center for Statistical Science, Peking University, Beijing, China 4Department of Statistics and Data Science, Tsinghua University, Beijing, China. Correspondence to: Ruibin Xi <EMAIL>, Weichi Wu <EMAIL>. |
| Pseudocode | Yes | The framework of EM-MMLE is summarized in Algorithm 1. |
| Open Source Code | Yes | All code is available at https://github.com/Xi Ds Lab/EMMMLE. |
| Open Datasets | Yes | In this section, we evaluate the performance of EM-MMLE for gene regulatory network inference using a real sc RNA-seq dataset (Zheng et al., 2017), comprising 6,952 cells across four cell types. The databases utilized in this study include STRING (Szklarczyk et al., 2019), Human TFDB (Hu et al., 2019), h TFtarget (Zhang et al., 2020), Ch EA (Lachmann et al., 2010), Ch IP-Atlas (Oki et al., 2018), Ch IPBase (Zhou et al., 2016), ESCAPE (Xu et al., 2013), TRRUST (Han et al., 2018), and Reg Network (Liu et al., 2015). |
| Dataset Splits | Yes | The dataset includes two batches, sequenced by 3 and 5 sc RNA-seq technologies. One batch is used to construct silver standards based on public regulatory network databases (Appendix C.1), while the other batch is reserved for algorithm testing. |
| Hardware Specification | No | Part of the analysis was performed on the highperformance computing platform of the Center for Life Sciences (Peking University). |
| Software Dependencies | No | The paper references various software and packages like 'Seurat (Stuart et al., 2019)', 'CVXR (Fu et al., 2020)', 'PPCOR (Kim, 2015)', and 'GENIE3 (Huynh-Thu et al., 2010)'. However, it does not provide specific version numbers for key software components or libraries used for the implementation of the paper's methodology itself, which are necessary for exact reproducibility. |
| Experiment Setup | No | The paper provides detailed parameters for generating synthetic data in its simulation studies (e.g., 'The number of populations G is set as 3, and the proportion parameter π is set as (1/3, 1/3, 1/3).', 'We set p = 50, 200 and evaluate the performance for three sample sizes: n = 200, 500, 3000.', 'we set α1 = 0.15'). It also mentions the use of AIC for tuning parameter selection. However, the main text does not contain specific hyperparameters (like learning rate, batch size, number of epochs, or optimizer settings) for training the EM-MMLE model or other methods compared. |