Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification
Authors: Leo L. Duan
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Data Experiments Since most clustering approaches are based on a single view, we first compare our model with them using simulations. For a clear visualization, we generate data from the twocomponent mixture distribution in a single view yi R2, with n = 400. Figure 7(a-f) plots the generated data under 6 different settings. ... Table 1: Normalized mutual information showing the accuracy of single view clustering, using the data simulated in Figure 7. ... 5.2. Multi View Experiments We first use a simulation to assess the multi-view clustering performance, under a large V n. ... 5.2.2. Clustering UCI Hand Written Digits ... 5.2.3. Clustering Brains via RNA-Sequencing Data |
| Researcher Affiliation | Academia | Leo L. Duan EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA |
| Pseudocode | No | The paper describes the Expectation-Maximization (EM) algorithm in Section 3, detailing the steps and equations, but does not present a structured pseudocode block or algorithm. |
| Open Source Code | Yes | The software is provided on https://github.com/leoduan/ Latent Simplex Position. |
| Open Datasets | Yes | The dataset is the UCI Dutch utility maps handwritten digits data (https://archive.ics.uci.edu/ml/datasets/Multiple+Features)... The data are obtained from the Allen Institute for Brain Science (Miller et al., 2017) (https://aging.brainmap.org/download/index) |
| Dataset Splits | No | The paper describes data generation for simulations and uses existing datasets for clustering evaluation, but it does not specify explicit training, validation, or test splits for reproducibility. For example, in Section 5.1 and 5.2.1, data is generated or used for clustering directly, and in Section 5.2.2 and 5.2.3, existing datasets are used for clustering, but no specific splits are mentioned. |
| Hardware Specification | No | The paper mentions "the algorithm takes about 10 minutes to finish on a CUDA GPU with 11Gb of memory" in Section 5.2.1. While it specifies the type of GPU and its memory, it does not provide an exact model number (e.g., NVIDIA A100, Tesla V100), processor type, or speed. |
| Software Dependencies | No | The paper mentions using "Scikit-Learn Cluster package" and the "ADAM gradient descent algorithm" but does not provide specific version numbers for any software libraries, packages, or programming languages. |
| Experiment Setup | Yes | In this article we use ϵ = 10 3 as the threshold... we use αλ = 1/d as a common choice... When fitting the LSP model, we use d = g = 10... When fitting the LSP model,we use d = V as its possible max value, and g = 10 as the known ground truth... The LSP model was initialized at d = 30 and g = 30. |