Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm

Authors: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm comparing to the existing first-order methods.
Researcher Affiliation Academia Defeng Sun EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong; Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore 10 Lower Kent Ridge Road, Singapore 119076; Yancheng Yuan EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong
Pseudocode Yes Algorithm 1 Ssnal for (P) Initialization: Choose (X0, U0) Rd n Rd |E|, Z0 Rd |E|, σ0 > 0 and a summable nonnegative sequence {ϵk}. repeat Step 1. Compute ... Algorithm 2 Ssncg for (21) Initialization: Given X0 Rd n, µ (0, 1/2), τ (0, 1], and η, δ (0, 1). For j = 0, 1, . . . repeat ... Algorithm 3 iadmm for (P) Initialization: Choose σ > 0, (X0, U0, Z0) Rd n Rd |E| Rd |E|, and a summable nonnegative sequence {ϵk}. For k = 0, 1, . . . , repeat ...
Open Source Code No We will compare our algorithm with the open source software cvxclustr4 in (Chi and Lange, 2015), which is an R package with key functions written in C. We write our code in Matlab without any dedicated C functions.
Open Datasets Yes In this section, we show the performance of our algorithm Ssnal on three simulated data sets: Two Half-Moon, Unbalanced Gaussian (Rezaei and Fr anti, 2016) and semi-spherical shells data. ... we compare the performance of our proposed Ssnal with AMA on some real data sets, namely, MNIST, Fisher Iris, WINE, Yale Face B(10 Train subset).
Dataset Splits No The paper describes using simulated data (Two Half-Moon, Unbalanced Gaussian, semi-spherical shells) and real datasets (MNIST, Fisher Iris, WINE, Yale Face B). However, for these datasets, no specific information regarding training, validation, or test splits, or the methodology for creating such splits, is provided. The experiments focus on evaluating clustering performance directly.
Hardware Specification Yes All our computational results are obtained from a desktop having 16 cores with 32 Intel Xeon E5-2650 processors at 2.6 GHz and 64 GB memory.
Software Dependencies No We write our code in Matlab without any dedicated C functions. No specific version number for Matlab or any other software dependencies is provided.
Experiment Setup Yes In our experiments, we choose the weight wij as follows ( exp( 0.5 ai aj 2) if (i, j) E, 0 otherwise, where E = n i=1{(i, j) | aj is among ai s 30-nearest neighbors, i < j n} 5 α=1 {(i, j) | i, j Iα, i < j}. ... In the experiments, we choose k = 10, φ = 0.5 (for the weights wij) and γ [0.2 : 0.2 : 10] (in Matlab notation) to generate the clustering path. ... In our experiments, we set ϵ = 10 6 unless specified otherwise.