reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simultaneous Clustering and Estimation of Heterogeneous Graphical Models

Authors: Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statistical accuracy) and optimization error (computational complexity).
Researcher Affiliation	Academia	Botao Hao EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA Will Wei Sun EMAIL Department of Management Science University of Miami School of Business Administration Miami, FL 33146, USA Yufeng Liu EMAIL Department of Statistics and Operations Research Department of Genetics Department of Biostatistics Carolina Center for Genome Sciences Lineberger Comprehensive Cancer Center University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA Guang Cheng EMAIL Department of Statistics Purdue University West Lafayette, IN 47906, USA
Pseudocode	Yes	We summarize the high-dimensional ECM algorithm for solving the SCAN method in Table 1. Table 1: The SCAN Algorithm
Open Source Code	No	The paper states: "The code is written in R and implemented on an Intel Xeon E5 processor with 64 GB of RAM." However, it does not provide any link, repository, or explicit statement of public availability for the code.
Open Datasets	Yes	For instance, in the glioblastoma multiforme (GBM) cancer dataset from The Cancer Genome Atlas Research Network (TCGA, 2008), Verhaak et al. (2010) showed that GBM cancer could be classiﬁed into four subtypes.
Dataset Splits	No	The paper describes how simulated data was generated (e.g., "n = 1000 observations from 2 clusters, and among them 500 observations are from N(µ1, Σ) and the rest 500 observations are from N(µ2, Σ)"). For the real dataset, it mentions using 482 GBM patients and evaluating clustering accuracy against true subtypes, but does not provide specific train/test/validation splits for model training or evaluation in a machine learning context.
Hardware Specification	Yes	The code is written in R and implemented on an Intel Xeon E5 processor with 64 GB of RAM.
Software Dependencies	No	The paper mentions: "The code is written in R". However, it does not specify a version number for R or any specific R libraries/packages with their version numbers that would be required for reproducibility.
Experiment Setup	Yes	In our framework, the tuning parameters are selected through the following adaptive BIC-type selection criterion. In our simulations, we choose the tuning range 10−2+2t/15 with t = 0, 1, . . . , 15 for all λ1, λ2, λ3. For the ﬁrst scenario, we consider 3 simulation models with varying choices of µ and η: Model 1: µ = 0.8 and η = 0.3, Model 2: µ = 1 and η = 0.3, Model 3: µ = 1 and η = 0.4. Moreover, we set the tuning parameters λ1 = 0.065, λ2 = 0.238, and λ3 = 0.138 in our SCAN algorithm.