Simultaneous Clustering and Estimation of Heterogeneous Graphical Models
Authors: Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statistical accuracy) and optimization error (computational complexity). |
| Researcher Affiliation | Academia | Botao Hao EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA Will Wei Sun EMAIL Department of Management Science University of Miami School of Business Administration Miami, FL 33146, USA Yufeng Liu EMAIL Department of Statistics and Operations Research Department of Genetics Department of Biostatistics Carolina Center for Genome Sciences Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA Guang Cheng EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA |
| Pseudocode | Yes | We summarize the high-dimensional ECM algorithm for solving the SCAN method in Table 1. Table 1: The SCAN Algorithm |
| Open Source Code | No | The paper states: "The code is written in R and implemented on an Intel Xeon E5 processor with 64 GB of RAM." However, it does not provide any link, repository, or explicit statement of public availability for the code. |
| Open Datasets | Yes | For instance, in the glioblastoma multiforme (GBM) cancer dataset from The Cancer Genome Atlas Research Network (TCGA, 2008), Verhaak et al. (2010) showed that GBM cancer could be classified into four subtypes. |
| Dataset Splits | No | The paper describes how simulated data was generated (e.g., "n = 1000 observations from 2 clusters, and among them 500 observations are from N(µ1, Σ) and the rest 500 observations are from N(µ2, Σ)"). For the real dataset, it mentions using 482 GBM patients and evaluating clustering accuracy against true subtypes, but does not provide specific train/test/validation splits for model training or evaluation in a machine learning context. |
| Hardware Specification | Yes | The code is written in R and implemented on an Intel Xeon E5 processor with 64 GB of RAM. |
| Software Dependencies | No | The paper mentions: "The code is written in R". However, it does not specify a version number for R or any specific R libraries/packages with their version numbers that would be required for reproducibility. |
| Experiment Setup | Yes | In our framework, the tuning parameters are selected through the following adaptive BIC-type selection criterion. In our simulations, we choose the tuning range 10−2+2t/15 with t = 0, 1, . . . , 15 for all λ1, λ2, λ3. For the first scenario, we consider 3 simulation models with varying choices of µ and η: Model 1: µ = 0.8 and η = 0.3, Model 2: µ = 1 and η = 0.3, Model 3: µ = 1 and η = 0.4. Moreover, we set the tuning parameters λ1 = 0.065, λ2 = 0.238, and λ3 = 0.138 in our SCAN algorithm. |