Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

Authors: Leo L. Duan

JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Data Experiments Since most clustering approaches are based on a single view, we first compare our model with them using simulations. For a clear visualization, we generate data from the twocomponent mixture distribution in a single view yi R2, with n = 400. Figure 7(a-f) plots the generated data under 6 different settings. ... Table 1: Normalized mutual information showing the accuracy of single view clustering, using the data simulated in Figure 7. ... 5.2. Multi View Experiments We first use a simulation to assess the multi-view clustering performance, under a large V n. ... 5.2.2. Clustering UCI Hand Written Digits ... 5.2.3. Clustering Brains via RNA-Sequencing Data
Researcher Affiliation Academia Leo L. Duan EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA
Pseudocode No The paper describes the Expectation-Maximization (EM) algorithm in Section 3, detailing the steps and equations, but does not present a structured pseudocode block or algorithm.
Open Source Code Yes The software is provided on https://github.com/leoduan/ Latent Simplex Position.
Open Datasets Yes The dataset is the UCI Dutch utility maps handwritten digits data (https://archive.ics.uci.edu/ml/datasets/Multiple+Features)... The data are obtained from the Allen Institute for Brain Science (Miller et al., 2017) (https://aging.brainmap.org/download/index)
Dataset Splits No The paper describes data generation for simulations and uses existing datasets for clustering evaluation, but it does not specify explicit training, validation, or test splits for reproducibility. For example, in Section 5.1 and 5.2.1, data is generated or used for clustering directly, and in Section 5.2.2 and 5.2.3, existing datasets are used for clustering, but no specific splits are mentioned.
Hardware Specification No The paper mentions "the algorithm takes about 10 minutes to finish on a CUDA GPU with 11Gb of memory" in Section 5.2.1. While it specifies the type of GPU and its memory, it does not provide an exact model number (e.g., NVIDIA A100, Tesla V100), processor type, or speed.
Software Dependencies No The paper mentions using "Scikit-Learn Cluster package" and the "ADAM gradient descent algorithm" but does not provide specific version numbers for any software libraries, packages, or programming languages.
Experiment Setup Yes In this article we use ϵ = 10 3 as the threshold... we use αλ = 1/d as a common choice... When fitting the LSP model, we use d = g = 10... When fitting the LSP model,we use d = V as its possible max value, and g = 10 as the known ground truth... The LSP model was initialized at d = 30 and g = 30.