reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Large sample spectral analysis of graph-based multi-manifold clustering

Authors: Nicolas Garcia Trillos, Pengfei He, Chenghui Li

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical experiments expand the insights that our theory provides on the MMC problem. ... We present a series of numerical experiments aimed at achieving these two goals. In addition, at the end of this section we compare the performance of spectral clustering using path-based graphs with other spectral-based algorithms by testing them on synthetic and real data sets.
Researcher Affiliation	Academia	Nicol as Garc ıa Trillos EMAIL Department of Statistics University of Wisconsin Madison, Wisconsin, USA; Pengfei He EMAIL Department of Statistics and Probability Michigan State University East Lansing, MI, USA; Chenghui Li EMAIL Department of Statistics University of Wisconsin Madison, Wisconsin, USA. All listed institutions are universities, and email domains are .edu, indicating academic affiliations.
Pseudocode	Yes	Algorithm 1 Annular proximity graph with angle constraints
Open Source Code	Yes	1. The implementation of our algorithm can be found in github.com/chl781/manifold-clustering
Open Datasets	Yes	In this section we compare misclustering rates when we test algorithms on subsets of the MNIST data set consisting of diﬀerent pairs of digits.
Dataset Splits	No	The paper mentions using subsets of the MNIST dataset and applying preprocessing steps, but it does not specify explicit training, validation, or test splits used for its experiments, nor does it refer to predefined standard splits for its evaluations.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for conducting the numerical experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., programming language versions, library versions, or solver versions).
Experiment Setup	Yes	Unless otherwise noted, whenever we use the nearest neighbor version of our algorithm we will select k = 2/3k+ and tune k+ in order to minimize the misclustering rate of the output clusters. In the toy examples where we use the (ε+, ε ) version of our algorithm, we tune ε+ and ε so that vmεm + =: k+ N and vmεm =: k N, where vm is the volume of the unit ball in Rm. The other parameters in the algorithm, α and κ (or r), are tuned to minimize the misclustering rate.