Adjusting for Chance Clustering Comparison Measures
Authors: Simone Romano, Nguyen Xuan Vinh, James Bailey, Karin Verspoor
JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we show that our adjusted generalized IT measures have a baseline value of 0 when comparing random partitions U and V . In Figure 3 we show the behavior of AMIq, ARI, and AMI on the same experiment proposed in Section 2.2. They are all close to 0 with negligible variation when the partitions are random and independent. Moreover, it is interesting to see the equivalence of AMI2 and ARI. On the other hand, the equivalence of AMIq and AMI with Shannon entropy is obtained only at the limit q 1. (...) In this section, we evaluate the performance of standardized measures on selection bias correction when partitions U are generated at random and independently from the reference partition V . |
| Researcher Affiliation | Academia | Dept. of Computing and Information Systems, The University of Melbourne, VIC, Australia. |
| Pseudocode | No | The paper primarily presents mathematical derivations, theorems, and proofs. It describes methods and computations using equations and textual explanations, rather than structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code has been made available online1. 1. https://sites.google.com/site/adjgenit/ |
| Open Datasets | No | The paper uses synthetic data generated for its experiments: "Given a dataset of N = 100 objects, we randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others." It does not refer to any external, publicly available datasets. |
| Dataset Splits | No | The paper describes generating random partitions for experimental simulations (e.g., "randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others"). This is not a description of dataset splits in the context of training, validation, and testing commonly found in machine learning experiments for reproducibility. |
| Hardware Specification | No | Experiments were carried out on Amazon cloud supported by AWS in Education Grant Award. No specific GPU, CPU models, or detailed cloud instance types are mentioned beyond "Amazon cloud" and "AWS". |
| Software Dependencies | No | The paper does not mention any specific software dependencies such as libraries, frameworks, or solvers with version numbers. |
| Experiment Setup | Yes | Given a dataset of N = 100 objects, we randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others. The average value of NMIq over 1, 000 simulations for diļ¬erent values of q is shown in Figure 2. (...) Given a reference partition V on N = 100 objects with c = 4 sets, we generate a pool of random partitions U with r ranging from 2 to 10 sets. Then, we use NMIq(U, V ) to select the closest partition to the reference V . The plot at the bottom of Figure 10 shows the probability of selection of a partition U with r sets using NMIq computed on 5000 simulations. |