Comparing Hard and Overlapping Clusterings
Authors: Danilo Horta, Ricardo J.G.B. Campello
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The reported experiments show that only 13AGRI could provide both a fine-grained evaluation across clusterings with different numbers of clusters and a constant evaluation between random clusterings, showing all the four desirable properties considered here. We identified a high correlation between 13AGRI applied to fuzzy clusterings and ARI applied to hard exclusive clusterings over 14 real data sets from the UCI repository, which corroborates the validity of 13AGRI fuzzy clustering evaluation. 13AGRI also showed good results as a clustering stability statistic for solutions produced by the expectation maximization algorithm for Gaussian mixture. Implementation and supplementary figures can be found at http://sn.im/25a9h8u. Keywords: overlapping, fuzzy, probabilistic, clustering evaluation 1. Introduction |
| Researcher Affiliation | Academia | Danilo Horta EMAIL Ricardo J. G. B. Campello EMAIL Instituto de Ciˆencias Matem aticas e de Computa c ao Universidade de S ao Paulo Campus de S ao Carlos Caixa Postal 668, 13560-970, S ao Carlos-SP, Brazil |
| Pseudocode | Yes | Algorithm 1 Compute E[ a]U,V ... Algorithm 2 Stability assessment |
| Open Source Code | Yes | Implementation and supplementary figures can be found at http://sn.im/25a9h8u. |
| Open Datasets | Yes | We applied 13AGRI and ARI to evaluate fuzzy c-means (Bezdek, 1981) and k-means (Mac Queen, 1967) solutions, respectively, over 14 real data sets from UCI repository (Newman and Asuncion, 2010). |
| Dataset Splits | Yes | Subsamples were generated by randomly selecting 80% of the data set objects, without replacement, as in (Monti et al., 2003). |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. It mentions the execution of algorithms and experiments but no information about CPU, GPU, or other hardware components. |
| Software Dependencies | No | The paper mentions several algorithms (k-means, fuzzy c-means, EMGM, SUBCLU, IPCM2) that were used, but it does not provide specific software dependency versions (e.g., programming language, libraries, or frameworks with version numbers) needed to replicate the experiments. |
| Experiment Setup | Yes | The FCM and IPCM2 exponent m was set to 2 (which is commonly adopted in the literature), the SUBCLU parameter minpts was set to 5, and the Euclidean norm was adopted; this same configuration was used in all the experiments reported in this work. |