reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Spectral Clustering Based on Local PCA

Authors: Ery Arias-Castro, Gilad Lerman, Teng Zhang

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We establish theoretical guarantees for simpler variants within a prototypical mathematical framework for multi-manifold clustering, and evaluate our algorithm on various simulated data sets. ... We tested our algorithm in more elaborate settings, some of them described in Section 4. ... In all experiments, the number of clusters K and the dimension of the manifolds d are assumed known. ... We first run Algorithm 4 on several artificial data sets, which are demonstrated in the LHS of Figures 3 and 4. Table 1 reports the local radius r used for each data set (R is the global radius of each data set), and the statistics for misclustering rates. ... In another simulation, we show the dependence of the success of our algorithm on the intersecting angle between curves in Table 2 and Figure 5. ... Next, we run experiments on the Extended Yale Face Database B (Lee et al., 2005), with the goal of clustering face images of two different subjects. ... We record the misclustering rates of Algorithm 5, SMCE and LLE in Table 3.
Researcher Affiliation	Academia	Ery Arias-Castro EMAIL Department of Mathematics University of California, San Diego La Jolla, CA 92093, USA Gilad Lerman EMAIL Department of Mathematics University of Minnesota, Twin Cities Minneapolis, MN 55455, USA Teng Zhang EMAIL Department of Mathematics University of Central Florida Orlando, FL 32816, USA
Pseudocode	Yes	Algorithm 1 Spectral Graph Partitioning (Ng, Jordan, and Weiss, 2002) ... Algorithm 2 Connected Component Extraction: Comparing Covariances ... Algorithm 3 Connected Component Extraction: Comparing Projections ... Algorithm 4 Spectral Clustering Based on Local PCA ... Algorithm 5 Spectral Clustering Based on Local PCA (for small data sets)
Open Source Code	Yes	The code is available online at https://math.cos.ucf.edu/tengz.
Open Datasets	Yes	Next, we run experiments on the Extended Yale Face Database B (Lee et al., 2005), with the goal of clustering face images of two different subjects.
Dataset Splits	No	The paper describes experiments on simulated and real-world datasets for clustering, which typically do not involve explicit training/validation/test splits in the same way as supervised learning tasks. For the simulated datasets, it mentions
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software names with version numbers (e.g., Python 3.x, specific library versions) that would be needed to replicate the experiment.
Experiment Setup	Yes	In all experiments, the number of clusters K and the dimension of the manifolds d are assumed known. We choose the spatial scale ε and the projection scale η automatically as follows: we let ε = max 1 i n0 min j =i yi yj , and η = median (i,j): yi yj <ε Qi Qj . ... The neighborhood radius r is chosen by hand for each situation. ... For SMCE3, λ = 10 and L = 60, and we remark that similar results are obtained for a wide range of parameters. For LLE, we follow the implementation in (Polito and Perona, 2001), use 10-nearest neighbors to embed the data set into R2 and run K-means on the embedded data set. ... For Algorithm 5, we let the neighborhood size be 40.