reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees

Authors: Ziwen Wang, Yancheng Yuan, Jiaming Ma, Tieyong Zeng, Defeng Sun

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical results demonstrate the robustness and superior performance of the randomly projected convex clustering model. The numerical results will also demonstrate that the randomly projected convex clustering model can outperform other popular clustering models on the dimension-reduced data, including the randomly projected K-means model. ... Extensive numerical experiment results will be presented in this paper to justify the theoretical guarantees and to demonstrate the superior performance and robustness of the proposed model.
Researcher Affiliation	Academia	Ziwen Wang EMAIL Department of Mathematics The Chinese University of Hong Kong Hong Kong; Yancheng Yuan EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong; Jiaming Ma EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong; Tieyong Zeng EMAIL Department of Mathematics The Chinese University of Hong Kong Hong Kong; Defeng Sun EMAIL Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong
Pseudocode	No	The paper includes figures (Figure 2, Figure 3) showing the overall structure and procedures, but these are flowcharts/diagrams and not structured pseudocode or algorithm blocks. It refers to algorithms from other papers (e.g., AS-SSNAL) but does not present their pseudocode here.
Open Source Code	No	The paper does not contain any explicit statements from the authors releasing their code for the methodology described. It mentions using open-source implementations for baseline comparisons (e.g., k-means2 function, spectralcluster3 function), but this refers to third-party tools, not the authors' own implementation.
Open Datasets	Yes	Datasets: We conduct tests on the following real datasets: LIBRAS and LIBRAS-6 (Dias et al., 2009a), COIL-20 (Nene et al., 1996), LUNG (Lee et al., 2010), and MNIST (Le Cun et al., 1998).
Dataset Splits	No	The paper describes how simulated data was generated and mentions evaluating clustering paths over sequences of gamma values. However, it does not provide specific train/test/validation splits (percentages, counts, or references to predefined splits) for reproducibility of the data partitioning in its experiments on either simulated or real datasets.
Hardware Specification	Yes	All our computational results were obtained using MATLAB on a Windows laptop (Intel Core i7-8750H @ 2.20GHz RAM 32GB).
Software Dependencies	No	The paper states that results were obtained using MATLAB but does not specify a version number for MATLAB itself, nor does it list any specific versions of libraries, toolboxes, or external solvers used, which would be necessary for full reproducibility. It refers to custom functions like 'K-means2 function', 'spectralcluster3 function', etc., but these are MATLAB functions without explicit versioning information as external dependencies.
Experiment Setup	Yes	We adopt the stopping criterion in (Yuan et al., 2022) with a tolerance ϵtol = 10 7. For more details of the algorithm, one can refer to (Yuan et al., 2022) and the references therein. We obtain a clustering path based on the inexact checking rule up to the tolerance ϵclust = 10 5. ... We adopt the popular choice of weights constructed by the Gaussian kernel (42) with a k-nearest neighbors graph with k = 5 (Chi and Lange, 2015; Yuan et al., 2018; Sun et al., 2021). ... we directly test diﬀerent scales of embedding dimensions m {10, 20, 50, 100, 200}. ... For LUNG, we set γ [6 : 0.01 : 0.1]. ... For (RP) KM++: We use the K-means2 function with Number of clusters =K... We set Max Iter =1000 and Replicates =30.