Large sample spectral analysis of graph-based multi-manifold clustering
Authors: Nicolas Garcia Trillos, Pengfei He, Chenghui Li
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments expand the insights that our theory provides on the MMC problem. ... We present a series of numerical experiments aimed at achieving these two goals. In addition, at the end of this section we compare the performance of spectral clustering using path-based graphs with other spectral-based algorithms by testing them on synthetic and real data sets. |
| Researcher Affiliation | Academia | Nicol as Garc ıa Trillos EMAIL Department of Statistics University of Wisconsin Madison, Wisconsin, USA; Pengfei He EMAIL Department of Statistics and Probability Michigan State University East Lansing, MI, USA; Chenghui Li EMAIL Department of Statistics University of Wisconsin Madison, Wisconsin, USA. All listed institutions are universities, and email domains are .edu, indicating academic affiliations. |
| Pseudocode | Yes | Algorithm 1 Annular proximity graph with angle constraints |
| Open Source Code | Yes | 1. The implementation of our algorithm can be found in github.com/chl781/manifold-clustering |
| Open Datasets | Yes | In this section we compare misclustering rates when we test algorithms on subsets of the MNIST data set consisting of different pairs of digits. |
| Dataset Splits | No | The paper mentions using subsets of the MNIST dataset and applying preprocessing steps, but it does not specify explicit training, validation, or test splits used for its experiments, nor does it refer to predefined standard splits for its evaluations. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for conducting the numerical experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming language versions, library versions, or solver versions). |
| Experiment Setup | Yes | Unless otherwise noted, whenever we use the nearest neighbor version of our algorithm we will select k = 2/3k+ and tune k+ in order to minimize the misclustering rate of the output clusters. In the toy examples where we use the (ε+, ε ) version of our algorithm, we tune ε+ and ε so that vmεm + =: k+ N and vmεm =: k N, where vm is the volume of the unit ball in Rm. The other parameters in the algorithm, α and κ (or r), are tuned to minimize the misclustering rate. |