reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributed Kernel-Driven Data Clustering

Authors: Ioannis Schizas

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Detailed numerical examples utilizing both synthetic and real data demonstrate that the distributed novel approach can achieve clustering performance that gets close or even exceeds the one achieved by existing centralized alternatives.
Researcher Affiliation	Academia	Ioannis Schizas EMAIL US Army Combat Capabilities Development Command Army Research Lab Aberdeen Proving Ground, MD 21005, USA
Pseudocode	Yes	Algorithm 1 Centralized Joint Kernel Selection and Clustering (CKC) and Algorithm 2 Distributed Kernel Selection and Clustering (DKC)
Open Source Code	No	The paper does not provide an explicit statement of code release or a link to a code repository for the described methodology.
Open Datasets	Yes	The Unimib dataset (Micucci et al., 2017) corresponding to a collection of smartphone-based human activity detection readings with Q = 3 different classes; (2) The Salinas dataset in (Sal, 2021) which consists of 3-dimensional hyperspectral images with Q = 4; and Hyperspectral remote sensing scenes. Available: http://www.ehu.eus/ccwintco/ index.php?title=Hyperspectral_Remote_Sensing_Scenes, 2021.
Dataset Splits	No	The paper describes the composition of the datasets used (e.g., '20 sensing units acquire data entries...corresponding to walking'), but does not provide specific training/test/validation dataset splits or mention cross-validation setup.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments or training the models.
Software Dependencies	No	The paper mentions using 'interior point methods' (e.g., Boyd and Vandenberghe, 2004; Grant and Boyd, 2014) for solving SDPs, implying software like CVX, but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The parameters for DKC where set as v = 0.01, µ = 10, ω = 2, ξ = 15 and c = 0.01. All methods in this synthetic example are able to reach accuracy, NMI and purity equal to one, though the novel DKC approach does not require the presence of a central processor. Note that as the number of ADMM iterations increases the rate of convergence goes up; especially when increasing from ρt = 1 to ρt = 3 beyond which the rate gains are negligible. Kmax denotes the user-deﬁned maximum number of DCA iterations employed during coordinate descent iteration τ, in case it takes too long for the breaking conditions in lines 7 or 15 of Alg. 1 to be satisﬁed. We set this value equal to Kmax = 20 for the numerical tests conducted in the paper ensuring convergence.