Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm
Authors: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm comparing to the existing first-order methods. |
| Researcher Affiliation | Academia | Defeng Sun EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong; Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore 10 Lower Kent Ridge Road, Singapore 119076; Yancheng Yuan EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong |
| Pseudocode | Yes | Algorithm 1 Ssnal for (P) Initialization: Choose (X0, U0) Rd n Rd |E|, Z0 Rd |E|, σ0 > 0 and a summable nonnegative sequence {ϵk}. repeat Step 1. Compute ... Algorithm 2 Ssncg for (21) Initialization: Given X0 Rd n, µ (0, 1/2), τ (0, 1], and η, δ (0, 1). For j = 0, 1, . . . repeat ... Algorithm 3 iadmm for (P) Initialization: Choose σ > 0, (X0, U0, Z0) Rd n Rd |E| Rd |E|, and a summable nonnegative sequence {ϵk}. For k = 0, 1, . . . , repeat ... |
| Open Source Code | No | We will compare our algorithm with the open source software cvxclustr4 in (Chi and Lange, 2015), which is an R package with key functions written in C. We write our code in Matlab without any dedicated C functions. |
| Open Datasets | Yes | In this section, we show the performance of our algorithm Ssnal on three simulated data sets: Two Half-Moon, Unbalanced Gaussian (Rezaei and Fr anti, 2016) and semi-spherical shells data. ... we compare the performance of our proposed Ssnal with AMA on some real data sets, namely, MNIST, Fisher Iris, WINE, Yale Face B(10 Train subset). |
| Dataset Splits | No | The paper describes using simulated data (Two Half-Moon, Unbalanced Gaussian, semi-spherical shells) and real datasets (MNIST, Fisher Iris, WINE, Yale Face B). However, for these datasets, no specific information regarding training, validation, or test splits, or the methodology for creating such splits, is provided. The experiments focus on evaluating clustering performance directly. |
| Hardware Specification | Yes | All our computational results are obtained from a desktop having 16 cores with 32 Intel Xeon E5-2650 processors at 2.6 GHz and 64 GB memory. |
| Software Dependencies | No | We write our code in Matlab without any dedicated C functions. No specific version number for Matlab or any other software dependencies is provided. |
| Experiment Setup | Yes | In our experiments, we choose the weight wij as follows ( exp( 0.5 ai aj 2) if (i, j) E, 0 otherwise, where E = n i=1{(i, j) | aj is among ai s 30-nearest neighbors, i < j n} 5 α=1 {(i, j) | i, j Iα, i < j}. ... In the experiments, we choose k = 10, φ = 0.5 (for the weights wij) and γ [0.2 : 0.2 : 10] (in Matlab notation) to generate the clustering path. ... In our experiments, we set ϵ = 10 6 unless specified otherwise. |