Large sample spectral analysis of graph-based multi-manifold clustering

Authors: Nicolas Garcia Trillos, Pengfei He, Chenghui Li

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments expand the insights that our theory provides on the MMC problem. ... We present a series of numerical experiments aimed at achieving these two goals. In addition, at the end of this section we compare the performance of spectral clustering using path-based graphs with other spectral-based algorithms by testing them on synthetic and real data sets.
Researcher Affiliation Academia Nicol as Garc ıa Trillos EMAIL Department of Statistics University of Wisconsin Madison, Wisconsin, USA; Pengfei He EMAIL Department of Statistics and Probability Michigan State University East Lansing, MI, USA; Chenghui Li EMAIL Department of Statistics University of Wisconsin Madison, Wisconsin, USA. All listed institutions are universities, and email domains are .edu, indicating academic affiliations.
Pseudocode Yes Algorithm 1 Annular proximity graph with angle constraints
Open Source Code Yes 1. The implementation of our algorithm can be found in github.com/chl781/manifold-clustering
Open Datasets Yes In this section we compare misclustering rates when we test algorithms on subsets of the MNIST data set consisting of different pairs of digits.
Dataset Splits No The paper mentions using subsets of the MNIST dataset and applying preprocessing steps, but it does not specify explicit training, validation, or test splits used for its experiments, nor does it refer to predefined standard splits for its evaluations.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for conducting the numerical experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming language versions, library versions, or solver versions).
Experiment Setup Yes Unless otherwise noted, whenever we use the nearest neighbor version of our algorithm we will select k = 2/3k+ and tune k+ in order to minimize the misclustering rate of the output clusters. In the toy examples where we use the (ε+, ε ) version of our algorithm, we tune ε+ and ε so that vmεm + =: k+ N and vmεm =: k N, where vm is the volume of the unit ball in Rm. The other parameters in the algorithm, α and κ (or r), are tuned to minimize the misclustering rate.