reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm

Authors: Defeng Sun, Kim-Chuan Toh, Yancheng Yuan

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly eﬃcient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm comparing to the existing ﬁrst-order methods.
Researcher Affiliation	Academia	Defeng Sun EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong; Kim-Chuan Toh EMAIL Department of Mathematics and Institute of Operations Research and Analytics National University of Singapore 10 Lower Kent Ridge Road, Singapore 119076; Yancheng Yuan EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong
Pseudocode	Yes	Algorithm 1 Ssnal for (P) Initialization: Choose (X0, U0) Rd n Rd \|E\|, Z0 Rd \|E\|, σ0 > 0 and a summable nonnegative sequence {ϵk}. repeat Step 1. Compute ... Algorithm 2 Ssncg for (21) Initialization: Given X0 Rd n, µ (0, 1/2), τ (0, 1], and η, δ (0, 1). For j = 0, 1, . . . repeat ... Algorithm 3 iadmm for (P) Initialization: Choose σ > 0, (X0, U0, Z0) Rd n Rd \|E\| Rd \|E\|, and a summable nonnegative sequence {ϵk}. For k = 0, 1, . . . , repeat ...
Open Source Code	No	We will compare our algorithm with the open source software cvxclustr4 in (Chi and Lange, 2015), which is an R package with key functions written in C. We write our code in Matlab without any dedicated C functions.
Open Datasets	Yes	In this section, we show the performance of our algorithm Ssnal on three simulated data sets: Two Half-Moon, Unbalanced Gaussian (Rezaei and Fr anti, 2016) and semi-spherical shells data. ... we compare the performance of our proposed Ssnal with AMA on some real data sets, namely, MNIST, Fisher Iris, WINE, Yale Face B(10 Train subset).
Dataset Splits	No	The paper describes using simulated data (Two Half-Moon, Unbalanced Gaussian, semi-spherical shells) and real datasets (MNIST, Fisher Iris, WINE, Yale Face B). However, for these datasets, no specific information regarding training, validation, or test splits, or the methodology for creating such splits, is provided. The experiments focus on evaluating clustering performance directly.
Hardware Specification	Yes	All our computational results are obtained from a desktop having 16 cores with 32 Intel Xeon E5-2650 processors at 2.6 GHz and 64 GB memory.
Software Dependencies	No	We write our code in Matlab without any dedicated C functions. No specific version number for Matlab or any other software dependencies is provided.
Experiment Setup	Yes	In our experiments, we choose the weight wij as follows ( exp( 0.5 ai aj 2) if (i, j) E, 0 otherwise, where E = n i=1{(i, j) \| aj is among ai s 30-nearest neighbors, i < j n} 5 α=1 {(i, j) \| i, j Iα, i < j}. ... In the experiments, we choose k = 10, φ = 0.5 (for the weights wij) and γ [0.2 : 0.2 : 10] (in Matlab notation) to generate the clustering path. ... In our experiments, we set ϵ = 10 6 unless speciﬁed otherwise.