Regularized Joint Mixture Models
Authors: Konstantinos Perrakis, Thomas Lartigue, Frank Dondelinger, Sach Mukherjee
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the key ideas via empirical examples. An R package is available at https://github.com/k-perrakis/regjmix. ... In Section 5 we present empirical examples, focusing initially on small-scale simulations and then proceeding to larger scale semi-synthetic experiments and applications to real data. |
| Researcher Affiliation | Academia | Konstantinos Perrakis EMAIL Department of Mathematical Sciences Durham University, UK; Thomas Lartigue EMAIL Aramis Project Team, Inria & Center of Applied Mathematics, CNRS, Ecole Polytechnique, IP Paris, France; Frank Dondelinger EMAIL Lancaster Medical School Lancaster UK; Sach Mukherjee EMAIL German Center for Neurodegenerative Diseases, Bonn, Germany & MRC Biostatistics Unit, University of Cambridge, UK |
| Pseudocode | No | The paper describes the expectation and maximization steps of the EM algorithm in Section 3.1 with mathematical equations (e.g., 'The E-Step.' and 'The M-Step.'). However, it does not present a clearly labeled 'Pseudocode' or 'Algorithm' block with structured, step-by-step instructions. The descriptions are provided in paragraph form supported by equations. |
| Open Source Code | Yes | An R package is available at https://github.com/k-perrakis/regjmix. ... The RJM methods presented in this paper are implemented as an R package regjmix, available at https://github.com/k-perrakis/regjmix. |
| Open Datasets | Yes | The simulations presented below are based on data from the The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov). ... We use the TCGA data as introduced above, with gene expression levels treated as features. |
| Dataset Splits | Yes | For all simulations we use n 250, balanced group sample sizes, i.e. nk 125 for k 1, 2... We use 80% of the samples for training and 20% for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud instance specifications used for running the experiments. It mentions various R packages for software, but no hardware. |
| Software Dependencies | No | The paper mentions several R packages used (e.g., 'R package glasso Fast', 'glmnet', 'mclust', 'Mo EClust', 'flexmix', 'cluster', 'Swarm SVM') and cites their respective papers. However, it does not explicitly state specific version numbers for these software components or the R environment itself, as required by the criteria. |
| Experiment Setup | Yes | For all simulations we use n 250, balanced group sample sizes, i.e. nk 125 for k 1, 2, and varying dimensionality for the features; namely, i) p 100 (n ą p problem), ii) p 250 (n p problem) and iii) p 500 (n ă p problem). ... As a default option we use ten EM starts. For the termination of the algorithm we use a combination of two criteria that are commonly used in practice. The first is to simply set a maximum number p Tq of EM iterations. Empirical results suggest that the option T 20 is sufficient. The second criterion takes into account the relative change in the objective function in (15); namely, the algorithm is stopped when ˇˇˇˇˇ Qpθ, τ, λ|θptq, τ ptq, λptqq Qpθ, τ, λ|θpt 1q, τ pt 1q, λpt 1qq 1 ˇˇˇˇˇ ă ϵ, using as default option ϵ 10 6. |