Adjusting for Chance Clustering Comparison Measures

Authors: Simone Romano, Nguyen Xuan Vinh, James Bailey, Karin Verspoor

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we show that our adjusted generalized IT measures have a baseline value of 0 when comparing random partitions U and V . In Figure 3 we show the behavior of AMIq, ARI, and AMI on the same experiment proposed in Section 2.2. They are all close to 0 with negligible variation when the partitions are random and independent. Moreover, it is interesting to see the equivalence of AMI2 and ARI. On the other hand, the equivalence of AMIq and AMI with Shannon entropy is obtained only at the limit q 1. (...) In this section, we evaluate the performance of standardized measures on selection bias correction when partitions U are generated at random and independently from the reference partition V .
Researcher Affiliation Academia Dept. of Computing and Information Systems, The University of Melbourne, VIC, Australia.
Pseudocode No The paper primarily presents mathematical derivations, theorems, and proofs. It describes methods and computations using equations and textual explanations, rather than structured pseudocode or algorithm blocks.
Open Source Code Yes All code has been made available online1. 1. https://sites.google.com/site/adjgenit/
Open Datasets No The paper uses synthetic data generated for its experiments: "Given a dataset of N = 100 objects, we randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others." It does not refer to any external, publicly available datasets.
Dataset Splits No The paper describes generating random partitions for experimental simulations (e.g., "randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others"). This is not a description of dataset splits in the context of training, validation, and testing commonly found in machine learning experiments for reproducibility.
Hardware Specification No Experiments were carried out on Amazon cloud supported by AWS in Education Grant Award. No specific GPU, CPU models, or detailed cloud instance types are mentioned beyond "Amazon cloud" and "AWS".
Software Dependencies No The paper does not mention any specific software dependencies such as libraries, frameworks, or solvers with version numbers.
Experiment Setup Yes Given a dataset of N = 100 objects, we randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others. The average value of NMIq over 1, 000 simulations for different values of q is shown in Figure 2. (...) Given a reference partition V on N = 100 objects with c = 4 sets, we generate a pool of random partitions U with r ranging from 2 to 10 sets. Then, we use NMIq(U, V ) to select the closest partition to the reference V . The plot at the bottom of Figure 10 shows the probability of selection of a partition U with r sets using NMIq computed on 5000 simulations.