reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adjusting for Chance Clustering Comparison Measures

Authors: Simone Romano, Nguyen Xuan Vinh, James Bailey, Karin Verspoor

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we show that our adjusted generalized IT measures have a baseline value of 0 when comparing random partitions U and V . In Figure 3 we show the behavior of AMIq, ARI, and AMI on the same experiment proposed in Section 2.2. They are all close to 0 with negligible variation when the partitions are random and independent. Moreover, it is interesting to see the equivalence of AMI2 and ARI. On the other hand, the equivalence of AMIq and AMI with Shannon entropy is obtained only at the limit q 1. (...) In this section, we evaluate the performance of standardized measures on selection bias correction when partitions U are generated at random and independently from the reference partition V .
Researcher Affiliation	Academia	Dept. of Computing and Information Systems, The University of Melbourne, VIC, Australia.
Pseudocode	No	The paper primarily presents mathematical derivations, theorems, and proofs. It describes methods and computations using equations and textual explanations, rather than structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code has been made available online1. 1. https://sites.google.com/site/adjgenit/
Open Datasets	No	The paper uses synthetic data generated for its experiments: "Given a dataset of N = 100 objects, we randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others." It does not refer to any external, publicly available datasets.
Dataset Splits	No	The paper describes generating random partitions for experimental simulations (e.g., "randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others"). This is not a description of dataset splits in the context of training, validation, and testing commonly found in machine learning experiments for reproducibility.
Hardware Specification	No	Experiments were carried out on Amazon cloud supported by AWS in Education Grant Award. No specific GPU, CPU models, or detailed cloud instance types are mentioned beyond "Amazon cloud" and "AWS".
Software Dependencies	No	The paper does not mention any specific software dependencies such as libraries, frameworks, or solvers with version numbers.
Experiment Setup	Yes	Given a dataset of N = 100 objects, we randomly generate uniform partitions U with r = 2, 4, 6, 8, 10 sets and V with c = 6 sets independently of each others. The average value of NMIq over 1, 000 simulations for diﬀerent values of q is shown in Figure 2. (...) Given a reference partition V on N = 100 objects with c = 4 sets, we generate a pool of random partitions U with r ranging from 2 to 10 sets. Then, we use NMIq(U, V ) to select the closest partition to the reference V . The plot at the bottom of Figure 10 shows the probability of selection of a partition U with r sets using NMIq computed on 5000 simulations.