Subspace Clustering through Sub-Clusters

Authors: Weiwei Li, Jan Hannig, Sayan Mukherjee

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The numerical results indicate that for large datasets the proposed algorithm outperforms other state-of-the-art subspace clustering algorithms with respect to accuracy and speed. Keywords: dimension reduction, subspace clustering, sub-cluster, random sampling, scalability, handwritten digits, spectral clustering. Finally, we study empirical properties of the proposed algorithm on both synthetic and real-world datasets selected to have diverse sizes. We show that the clustering through sub-clusters algorithm is highly scalable and can significantly boost the clustering accuracy on both the subset and whole dataset. Section 4: Experimental Results.
Researcher Affiliation Academia Weiwei Li EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill, NC 27514, USA. Jan Hannig EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill, NC 27514, USA. Sayan Mukherjee EMAIL Department of Statistical Science Mathematics, Computer Science, Biostatistics & Bioinformatics Duke University Durham, NC 27708, USA.
Pseudocode Yes Algorithm 1: Sub-cluster Based Subspace Clustering (SBSC) algorithm.
Open Source Code Yes The code used to generate these results can be found in the supplementary material.
Open Datasets Yes 4.2.1 The Extended Yale B dataset. 4.2.2 The Zipcode dataset (Le Cun et al., 1990). 4.2.3 The MNIST dataset (MNIST).
Dataset Splits No The paper discusses sampling for the algorithm itself (
Hardware Specification No The paper mentions "on our machine" when comparing results for fair comparisons but does not provide any specific hardware details like CPU, GPU, or memory specifications.
Software Dependencies No The paper mentions implementation details in Python in the supplementary material (which is not provided in the text), but it does not specify any particular software libraries or their version numbers.
Experiment Setup Yes input : Data Y, number of subspaces K, sampling size n, neighbor threshold dmax, regularization parameters λ1 and λ2, residual minimization parameter m, affinity threshold tmax. In our numerical experiments, we choose n to be linear in K log N. Ideally, each sub-cluster YCi should well represent the subspace it belongs to... Therefore we want dmax to be larger than maxk=1,...,K dk ... For this reason we set dmax to grow linearly with D. Similarly the residual minimization parameter m should also be linear in D. We choose the threshold index tmax to be n - 2K.