Spectral Clustering Based on Local PCA

Authors: Ery Arias-Castro, Gilad Lerman, Teng Zhang

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We establish theoretical guarantees for simpler variants within a prototypical mathematical framework for multi-manifold clustering, and evaluate our algorithm on various simulated data sets. ... We tested our algorithm in more elaborate settings, some of them described in Section 4. ... In all experiments, the number of clusters K and the dimension of the manifolds d are assumed known. ... We first run Algorithm 4 on several artificial data sets, which are demonstrated in the LHS of Figures 3 and 4. Table 1 reports the local radius r used for each data set (R is the global radius of each data set), and the statistics for misclustering rates. ... In another simulation, we show the dependence of the success of our algorithm on the intersecting angle between curves in Table 2 and Figure 5. ... Next, we run experiments on the Extended Yale Face Database B (Lee et al., 2005), with the goal of clustering face images of two different subjects. ... We record the misclustering rates of Algorithm 5, SMCE and LLE in Table 3.
Researcher Affiliation Academia Ery Arias-Castro EMAIL Department of Mathematics University of California, San Diego La Jolla, CA 92093, USA Gilad Lerman EMAIL Department of Mathematics University of Minnesota, Twin Cities Minneapolis, MN 55455, USA Teng Zhang EMAIL Department of Mathematics University of Central Florida Orlando, FL 32816, USA
Pseudocode Yes Algorithm 1 Spectral Graph Partitioning (Ng, Jordan, and Weiss, 2002) ... Algorithm 2 Connected Component Extraction: Comparing Covariances ... Algorithm 3 Connected Component Extraction: Comparing Projections ... Algorithm 4 Spectral Clustering Based on Local PCA ... Algorithm 5 Spectral Clustering Based on Local PCA (for small data sets)
Open Source Code Yes The code is available online at https://math.cos.ucf.edu/tengz.
Open Datasets Yes Next, we run experiments on the Extended Yale Face Database B (Lee et al., 2005), with the goal of clustering face images of two different subjects.
Dataset Splits No The paper describes experiments on simulated and real-world datasets for clustering, which typically do not involve explicit training/validation/test splits in the same way as supervised learning tasks. For the simulated datasets, it mentions
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software names with version numbers (e.g., Python 3.x, specific library versions) that would be needed to replicate the experiment.
Experiment Setup Yes In all experiments, the number of clusters K and the dimension of the manifolds d are assumed known. We choose the spatial scale ε and the projection scale η automatically as follows: we let ε = max 1 i n0 min j =i yi yj , and η = median (i,j): yi yj <ε Qi Qj . ... The neighborhood radius r is chosen by hand for each situation. ... For SMCE3, λ = 10 and L = 60, and we remark that similar results are obtained for a wide range of parameters. For LLE, we follow the implementation in (Polito and Perona, 2001), use 10-nearest neighbors to embed the data set into R2 and run K-means on the embedded data set. ... For Algorithm 5, we let the neighborhood size be 40.