Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees
Authors: Ziwen Wang, Yancheng Yuan, Jiaming Ma, Tieyong Zeng, Defeng Sun
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical results demonstrate the robustness and superior performance of the randomly projected convex clustering model. The numerical results will also demonstrate that the randomly projected convex clustering model can outperform other popular clustering models on the dimension-reduced data, including the randomly projected K-means model. ... Extensive numerical experiment results will be presented in this paper to justify the theoretical guarantees and to demonstrate the superior performance and robustness of the proposed model. |
| Researcher Affiliation | Academia | Ziwen Wang EMAIL Department of Mathematics The Chinese University of Hong Kong Hong Kong; Yancheng Yuan EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong; Jiaming Ma EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong; Tieyong Zeng EMAIL Department of Mathematics The Chinese University of Hong Kong Hong Kong; Defeng Sun EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong |
| Pseudocode | No | The paper includes figures (Figure 2, Figure 3) showing the overall structure and procedures, but these are flowcharts/diagrams and not structured pseudocode or algorithm blocks. It refers to algorithms from other papers (e.g., AS-SSNAL) but does not present their pseudocode here. |
| Open Source Code | No | The paper does not contain any explicit statements from the authors releasing their code for the methodology described. It mentions using open-source implementations for baseline comparisons (e.g., k-means2 function, spectralcluster3 function), but this refers to third-party tools, not the authors' own implementation. |
| Open Datasets | Yes | Datasets: We conduct tests on the following real datasets: LIBRAS and LIBRAS-6 (Dias et al., 2009a), COIL-20 (Nene et al., 1996), LUNG (Lee et al., 2010), and MNIST (Le Cun et al., 1998). |
| Dataset Splits | No | The paper describes how simulated data was generated and mentions evaluating clustering paths over sequences of gamma values. However, it does not provide specific train/test/validation splits (percentages, counts, or references to predefined splits) for reproducibility of the data partitioning in its experiments on either simulated or real datasets. |
| Hardware Specification | Yes | All our computational results were obtained using MATLAB on a Windows laptop (Intel Core i7-8750H @ 2.20GHz RAM 32GB). |
| Software Dependencies | No | The paper states that results were obtained using MATLAB but does not specify a version number for MATLAB itself, nor does it list any specific versions of libraries, toolboxes, or external solvers used, which would be necessary for full reproducibility. It refers to custom functions like 'K-means2 function', 'spectralcluster3 function', etc., but these are MATLAB functions without explicit versioning information as external dependencies. |
| Experiment Setup | Yes | We adopt the stopping criterion in (Yuan et al., 2022) with a tolerance ϵtol = 10 7. For more details of the algorithm, one can refer to (Yuan et al., 2022) and the references therein. We obtain a clustering path based on the inexact checking rule up to the tolerance ϵclust = 10 5. ... We adopt the popular choice of weights constructed by the Gaussian kernel (42) with a k-nearest neighbors graph with k = 5 (Chi and Lange, 2015; Yuan et al., 2018; Sun et al., 2021). ... we directly test different scales of embedding dimensions m {10, 20, 50, 100, 200}. ... For LUNG, we set γ [6 : 0.01 : 0.1]. ... For (RP) KM++: We use the K-means2 function with Number of clusters =K... We set Max Iter =1000 and Replicates =30. |