Spectral Clustering Based on Local PCA
Authors: Ery Arias-Castro, Gilad Lerman, Teng Zhang
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We establish theoretical guarantees for simpler variants within a prototypical mathematical framework for multi-manifold clustering, and evaluate our algorithm on various simulated data sets. ... We tested our algorithm in more elaborate settings, some of them described in Section 4. ... In all experiments, the number of clusters K and the dimension of the manifolds d are assumed known. ... We first run Algorithm 4 on several artificial data sets, which are demonstrated in the LHS of Figures 3 and 4. Table 1 reports the local radius r used for each data set (R is the global radius of each data set), and the statistics for misclustering rates. ... In another simulation, we show the dependence of the success of our algorithm on the intersecting angle between curves in Table 2 and Figure 5. ... Next, we run experiments on the Extended Yale Face Database B (Lee et al., 2005), with the goal of clustering face images of two different subjects. ... We record the misclustering rates of Algorithm 5, SMCE and LLE in Table 3. |
| Researcher Affiliation | Academia | Ery Arias-Castro EMAIL Department of Mathematics University of California, San Diego La Jolla, CA 92093, USA Gilad Lerman EMAIL Department of Mathematics University of Minnesota, Twin Cities Minneapolis, MN 55455, USA Teng Zhang EMAIL Department of Mathematics University of Central Florida Orlando, FL 32816, USA |
| Pseudocode | Yes | Algorithm 1 Spectral Graph Partitioning (Ng, Jordan, and Weiss, 2002) ... Algorithm 2 Connected Component Extraction: Comparing Covariances ... Algorithm 3 Connected Component Extraction: Comparing Projections ... Algorithm 4 Spectral Clustering Based on Local PCA ... Algorithm 5 Spectral Clustering Based on Local PCA (for small data sets) |
| Open Source Code | Yes | The code is available online at https://math.cos.ucf.edu/tengz. |
| Open Datasets | Yes | Next, we run experiments on the Extended Yale Face Database B (Lee et al., 2005), with the goal of clustering face images of two different subjects. |
| Dataset Splits | No | The paper describes experiments on simulated and real-world datasets for clustering, which typically do not involve explicit training/validation/test splits in the same way as supervised learning tasks. For the simulated datasets, it mentions |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers (e.g., Python 3.x, specific library versions) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | In all experiments, the number of clusters K and the dimension of the manifolds d are assumed known. We choose the spatial scale ε and the projection scale η automatically as follows: we let ε = max 1 i n0 min j =i yi yj , and η = median (i,j): yi yj <ε Qi Qj . ... The neighborhood radius r is chosen by hand for each situation. ... For SMCE3, λ = 10 and L = 60, and we remark that similar results are obtained for a wide range of parameters. For LLE, we follow the implementation in (Polito and Perona, 2001), use 10-nearest neighbors to embed the data set into R2 and run K-means on the embedded data set. ... For Algorithm 5, we let the neighborhood size be 40. |