reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fair Clustering via Alignment

Authors: Kunwoong Kim, Jihu Lee, Sangchul Park, Yongdai Kim

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that FCA outperforms existing methods by (i) attaining a superior trade-off between fairness level and clustering utility, and (ii) achieving near-perfect fairness without numerical instability. (Section 1: Introduction)
Researcher Affiliation	Academia	1Department of Statistics, Seoul National University, Republic of Korea 2School of Law, Seoul National University, Republic of Korea.
Pseudocode	Yes	Algorithm 1 FCA algorithm (Section 4.1) and Algorithm 2 FCA-C algorithm (Section 4.2).
Open Source Code	Yes	Implementation code is available at https://github.com/kwkimonline/FCA.
Open Datasets	Yes	We use three benchmark tabular datasets, ADULT (Becker & Kohavi, 1996), BANK (Moro et al., 2012), and CENSUS (Meek et al.), from the UCI Machine Learning Repository2 (Dua & Graff, 2017).
Dataset Splits	No	The number of clusters K is set to 10 for ADULT and BANK, and 20 for CENSUS, following Ziko et al. (2021). We subsample 20,000 instances for CENSUS. The paper mentions these details and sensitive attribute distributions, but it does not specify explicit training, validation, or test splits for the datasets.
Hardware Specification	Yes	The computation is performed on several Intel Xeon Silver CPU cores and an additional RTX 4090 GPU processor. (Section C.2.1)
Software Dependencies	No	When solving the linear program (i.e., finding the coupling matrix Γ), we use the POT library (Flamary et al., 2021). For finding cluster centers, we adopt the scikit-learn library (Pedregosa et al., 2011) to run the K-means algorithm. (Section 5.1) This text mentions software libraries but does not provide specific version numbers for them.
Experiment Setup	Yes	The maximum number of iterations is set to 100, and we select the best iteration when Cost is minimized. (Section 5.1) The value of ε is sweeped in increments of 0.05, ranging from 0.1 to 0.9. (Section C.2.2) We set a learning rate of 0.005 for CENSUS dataset with L2 normalization, and 0.05 for all other cases. To accelerate convergence, 20 gradient steps of updating the centers are performed per iteration. (Section C.3.4)