Regularized Joint Mixture Models

Authors: Konstantinos Perrakis, Thomas Lartigue, Frank Dondelinger, Sach Mukherjee

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the key ideas via empirical examples. An R package is available at https://github.com/k-perrakis/regjmix. ... In Section 5 we present empirical examples, focusing initially on small-scale simulations and then proceeding to larger scale semi-synthetic experiments and applications to real data.
Researcher Affiliation Academia Konstantinos Perrakis EMAIL Department of Mathematical Sciences Durham University, UK; Thomas Lartigue EMAIL Aramis Project Team, Inria & Center of Applied Mathematics, CNRS, Ecole Polytechnique, IP Paris, France; Frank Dondelinger EMAIL Lancaster Medical School Lancaster UK; Sach Mukherjee EMAIL German Center for Neurodegenerative Diseases, Bonn, Germany & MRC Biostatistics Unit, University of Cambridge, UK
Pseudocode No The paper describes the expectation and maximization steps of the EM algorithm in Section 3.1 with mathematical equations (e.g., 'The E-Step.' and 'The M-Step.'). However, it does not present a clearly labeled 'Pseudocode' or 'Algorithm' block with structured, step-by-step instructions. The descriptions are provided in paragraph form supported by equations.
Open Source Code Yes An R package is available at https://github.com/k-perrakis/regjmix. ... The RJM methods presented in this paper are implemented as an R package regjmix, available at https://github.com/k-perrakis/regjmix.
Open Datasets Yes The simulations presented below are based on data from the The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov). ... We use the TCGA data as introduced above, with gene expression levels treated as features.
Dataset Splits Yes For all simulations we use n 250, balanced group sample sizes, i.e. nk 125 for k 1, 2... We use 80% of the samples for training and 20% for testing.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud instance specifications used for running the experiments. It mentions various R packages for software, but no hardware.
Software Dependencies No The paper mentions several R packages used (e.g., 'R package glasso Fast', 'glmnet', 'mclust', 'Mo EClust', 'flexmix', 'cluster', 'Swarm SVM') and cites their respective papers. However, it does not explicitly state specific version numbers for these software components or the R environment itself, as required by the criteria.
Experiment Setup Yes For all simulations we use n 250, balanced group sample sizes, i.e. nk 125 for k 1, 2, and varying dimensionality for the features; namely, i) p 100 (n ą p problem), ii) p 250 (n p problem) and iii) p 500 (n ă p problem). ... As a default option we use ten EM starts. For the termination of the algorithm we use a combination of two criteria that are commonly used in practice. The first is to simply set a maximum number p Tq of EM iterations. Empirical results suggest that the option T 20 is sufficient. The second criterion takes into account the relative change in the objective function in (15); namely, the algorithm is stopped when ˇˇˇˇˇ Qpθ, τ, λ|θptq, τ ptq, λptqq Qpθ, τ, λ|θpt 1q, τ pt 1q, λpt 1qq 1 ˇˇˇˇˇ ă ϵ, using as default option ϵ 10 6.