reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Comparing Hard and Overlapping Clusterings

Authors: Danilo Horta, Ricardo J.G.B. Campello

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The reported experiments show that only 13AGRI could provide both a ﬁne-grained evaluation across clusterings with diﬀerent numbers of clusters and a constant evaluation between random clusterings, showing all the four desirable properties considered here. We identiﬁed a high correlation between 13AGRI applied to fuzzy clusterings and ARI applied to hard exclusive clusterings over 14 real data sets from the UCI repository, which corroborates the validity of 13AGRI fuzzy clustering evaluation. 13AGRI also showed good results as a clustering stability statistic for solutions produced by the expectation maximization algorithm for Gaussian mixture. Implementation and supplementary ﬁgures can be found at http://sn.im/25a9h8u. Keywords: overlapping, fuzzy, probabilistic, clustering evaluation 1. Introduction
Researcher Affiliation	Academia	Danilo Horta EMAIL Ricardo J. G. B. Campello EMAIL Instituto de Ciˆencias Matem aticas e de Computa c ao Universidade de S ao Paulo Campus de S ao Carlos Caixa Postal 668, 13560-970, S ao Carlos-SP, Brazil
Pseudocode	Yes	Algorithm 1 Compute E[ a]U,V ... Algorithm 2 Stability assessment
Open Source Code	Yes	Implementation and supplementary ﬁgures can be found at http://sn.im/25a9h8u.
Open Datasets	Yes	We applied 13AGRI and ARI to evaluate fuzzy c-means (Bezdek, 1981) and k-means (Mac Queen, 1967) solutions, respectively, over 14 real data sets from UCI repository (Newman and Asuncion, 2010).
Dataset Splits	Yes	Subsamples were generated by randomly selecting 80% of the data set objects, without replacement, as in (Monti et al., 2003).
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments. It mentions the execution of algorithms and experiments but no information about CPU, GPU, or other hardware components.
Software Dependencies	No	The paper mentions several algorithms (k-means, fuzzy c-means, EMGM, SUBCLU, IPCM2) that were used, but it does not provide specific software dependency versions (e.g., programming language, libraries, or frameworks with version numbers) needed to replicate the experiments.
Experiment Setup	Yes	The FCM and IPCM2 exponent m was set to 2 (which is commonly adopted in the literature), the SUBCLU parameter minpts was set to 5, and the Euclidean norm was adopted; this same conﬁguration was used in all the experiments reported in this work.