Simultaneous Clustering and Estimation of Heterogeneous Graphical Models

Authors: Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statistical accuracy) and optimization error (computational complexity).
Researcher Affiliation Academia Botao Hao EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA Will Wei Sun EMAIL Department of Management Science University of Miami School of Business Administration Miami, FL 33146, USA Yufeng Liu EMAIL Department of Statistics and Operations Research Department of Genetics Department of Biostatistics Carolina Center for Genome Sciences Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA Guang Cheng EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA
Pseudocode Yes We summarize the high-dimensional ECM algorithm for solving the SCAN method in Table 1. Table 1: The SCAN Algorithm
Open Source Code No The paper states: "The code is written in R and implemented on an Intel Xeon E5 processor with 64 GB of RAM." However, it does not provide any link, repository, or explicit statement of public availability for the code.
Open Datasets Yes For instance, in the glioblastoma multiforme (GBM) cancer dataset from The Cancer Genome Atlas Research Network (TCGA, 2008), Verhaak et al. (2010) showed that GBM cancer could be classified into four subtypes.
Dataset Splits No The paper describes how simulated data was generated (e.g., "n = 1000 observations from 2 clusters, and among them 500 observations are from N(µ1, Σ) and the rest 500 observations are from N(µ2, Σ)"). For the real dataset, it mentions using 482 GBM patients and evaluating clustering accuracy against true subtypes, but does not provide specific train/test/validation splits for model training or evaluation in a machine learning context.
Hardware Specification Yes The code is written in R and implemented on an Intel Xeon E5 processor with 64 GB of RAM.
Software Dependencies No The paper mentions: "The code is written in R". However, it does not specify a version number for R or any specific R libraries/packages with their version numbers that would be required for reproducibility.
Experiment Setup Yes In our framework, the tuning parameters are selected through the following adaptive BIC-type selection criterion. In our simulations, we choose the tuning range 10−2+2t/15 with t = 0, 1, . . . , 15 for all λ1, λ2, λ3. For the first scenario, we consider 3 simulation models with varying choices of µ and η: Model 1: µ = 0.8 and η = 0.3, Model 2: µ = 1 and η = 0.3, Model 3: µ = 1 and η = 0.4. Moreover, we set the tuning parameters λ1 = 0.065, λ2 = 0.238, and λ3 = 0.138 in our SCAN algorithm.