reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

Authors: Leo L. Duan

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Data Experiments Since most clustering approaches are based on a single view, we ﬁrst compare our model with them using simulations. For a clear visualization, we generate data from the twocomponent mixture distribution in a single view yi R2, with n = 400. Figure 7(a-f) plots the generated data under 6 diﬀerent settings. ... Table 1: Normalized mutual information showing the accuracy of single view clustering, using the data simulated in Figure 7. ... 5.2. Multi View Experiments We ﬁrst use a simulation to assess the multi-view clustering performance, under a large V n. ... 5.2.2. Clustering UCI Hand Written Digits ... 5.2.3. Clustering Brains via RNA-Sequencing Data
Researcher Affiliation	Academia	Leo L. Duan EMAIL Department of Statistics University of Florida Gainesville, FL 32611, USA
Pseudocode	No	The paper describes the Expectation-Maximization (EM) algorithm in Section 3, detailing the steps and equations, but does not present a structured pseudocode block or algorithm.
Open Source Code	Yes	The software is provided on https://github.com/leoduan/ Latent Simplex Position.
Open Datasets	Yes	The dataset is the UCI Dutch utility maps handwritten digits data (https://archive.ics.uci.edu/ml/datasets/Multiple+Features)... The data are obtained from the Allen Institute for Brain Science (Miller et al., 2017) (https://aging.brainmap.org/download/index)
Dataset Splits	No	The paper describes data generation for simulations and uses existing datasets for clustering evaluation, but it does not specify explicit training, validation, or test splits for reproducibility. For example, in Section 5.1 and 5.2.1, data is generated or used for clustering directly, and in Section 5.2.2 and 5.2.3, existing datasets are used for clustering, but no specific splits are mentioned.
Hardware Specification	No	The paper mentions "the algorithm takes about 10 minutes to finish on a CUDA GPU with 11Gb of memory" in Section 5.2.1. While it specifies the type of GPU and its memory, it does not provide an exact model number (e.g., NVIDIA A100, Tesla V100), processor type, or speed.
Software Dependencies	No	The paper mentions using "Scikit-Learn Cluster package" and the "ADAM gradient descent algorithm" but does not provide specific version numbers for any software libraries, packages, or programming languages.
Experiment Setup	Yes	In this article we use ϵ = 10 3 as the threshold... we use αλ = 1/d as a common choice... When ﬁtting the LSP model, we use d = g = 10... When ﬁtting the LSP model,we use d = V as its possible max value, and g = 10 as the known ground truth... The LSP model was initialized at d = 30 and g = 30.