reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Subspace Clustering through Sub-Clusters

Authors: Weiwei Li, Jan Hannig, Sayan Mukherjee

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The numerical results indicate that for large datasets the proposed algorithm outperforms other state-of-the-art subspace clustering algorithms with respect to accuracy and speed. Keywords: dimension reduction, subspace clustering, sub-cluster, random sampling, scalability, handwritten digits, spectral clustering. Finally, we study empirical properties of the proposed algorithm on both synthetic and real-world datasets selected to have diverse sizes. We show that the clustering through sub-clusters algorithm is highly scalable and can significantly boost the clustering accuracy on both the subset and whole dataset. Section 4: Experimental Results.
Researcher Affiliation	Academia	Weiwei Li EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill, NC 27514, USA. Jan Hannig EMAIL Department of Statistics and Operations Research University of North Carolina at Chapel Hill Chapel Hill, NC 27514, USA. Sayan Mukherjee EMAIL Department of Statistical Science Mathematics, Computer Science, Biostatistics & Bioinformatics Duke University Durham, NC 27708, USA.
Pseudocode	Yes	Algorithm 1: Sub-cluster Based Subspace Clustering (SBSC) algorithm.
Open Source Code	Yes	The code used to generate these results can be found in the supplementary material.
Open Datasets	Yes	4.2.1 The Extended Yale B dataset. 4.2.2 The Zipcode dataset (Le Cun et al., 1990). 4.2.3 The MNIST dataset (MNIST).
Dataset Splits	No	The paper discusses sampling for the algorithm itself (
Hardware Specification	No	The paper mentions "on our machine" when comparing results for fair comparisons but does not provide any specific hardware details like CPU, GPU, or memory specifications.
Software Dependencies	No	The paper mentions implementation details in Python in the supplementary material (which is not provided in the text), but it does not specify any particular software libraries or their version numbers.
Experiment Setup	Yes	input : Data Y, number of subspaces K, sampling size n, neighbor threshold dmax, regularization parameters λ1 and λ2, residual minimization parameter m, affinity threshold tmax. In our numerical experiments, we choose n to be linear in K log N. Ideally, each sub-cluster YCi should well represent the subspace it belongs to... Therefore we want dmax to be larger than maxk=1,...,K dk ... For this reason we set dmax to grow linearly with D. Similarly the residual minimization parameter m should also be linear in D. We choose the threshold index tmax to be n - 2K.