Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut
Authors: Shenfei Pei, Feiping Nie, Rong Wang, Xuelong Li
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 12 real-world benchmark and 8 facial datasets validate the advantages of the proposed algorithm compared to the state-of-the-art clustering algorithms. In particular, over 15x and 7x speed-up can be obtained with respect to k-means on the synthetic dataset of 1 million samples and the benchmark dataset (Celeb A) of 200k samples, respectively. |
| Researcher Affiliation | Academia | Shenfei Pei School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University EMAIL Feiping Nie School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University EMAIL Rong Wang School of Cybersecurity and Center for OPTIMAL Northwestern Polytechnical University EMAIL Xuelong Li School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University EMAIL |
| Pseudocode | Yes | Algorithm 1: An efficient program for solving problem (21). |
| Open Source Code | Yes | In particular, over 15x and 7x speed-up can be obtained with respect to k-means on the synthetic dataset of 1 million samples and the benchmark dataset (Celeb A) of 200k samples, respectively [Git Hub]. |
| Open Datasets | Yes | Web Face [50] and Celeb A [23] are two large-scale public datasets available for face recognition and verification problems. CALFW [54] and CPLFW [53] are two variants of LFW aiming at cross-age and cross-pose face recognition, respectively. CACD [5], Adience [15], and FERET [35] are constructed for cross-age face retrieval, age and gender recognition, and facial recognition system evaluation. |
| Dataset Splits | No | The paper does not explicitly provide details about validation dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | Both k-means and our code run on the Arch machine with 3.20 GHz i7-8700 CPU, 32 GB main memory. |
| Software Dependencies | No | The paper mentions software like 'scikit-learn', 'C++', 'Dlib', and 'EFANNA', but it does not specify exact version numbers for any of these software dependencies. |
| Experiment Setup | Yes | The number of nearest neighbors k is fixed at 20 for 6 synthetic and 12 middle-scale real world datasets. The k-nearest neighbors graphs are generated by EFANNA [14] with k = 100 for all facial datasets. Every method takes 50 runs. The average results are reported. |