Fair Clustering via Alignment

Authors: Kunwoong Kim, Jihu Lee, Sangchul Park, Yongdai Kim

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that FCA outperforms existing methods by (i) attaining a superior trade-off between fairness level and clustering utility, and (ii) achieving near-perfect fairness without numerical instability. (Section 1: Introduction)
Researcher Affiliation Academia 1Department of Statistics, Seoul National University, Republic of Korea 2School of Law, Seoul National University, Republic of Korea.
Pseudocode Yes Algorithm 1 FCA algorithm (Section 4.1) and Algorithm 2 FCA-C algorithm (Section 4.2).
Open Source Code Yes Implementation code is available at https://github.com/kwkimonline/FCA.
Open Datasets Yes We use three benchmark tabular datasets, ADULT (Becker & Kohavi, 1996), BANK (Moro et al., 2012), and CENSUS (Meek et al.), from the UCI Machine Learning Repository2 (Dua & Graff, 2017).
Dataset Splits No The number of clusters K is set to 10 for ADULT and BANK, and 20 for CENSUS, following Ziko et al. (2021). We subsample 20,000 instances for CENSUS. The paper mentions these details and sensitive attribute distributions, but it does not specify explicit training, validation, or test splits for the datasets.
Hardware Specification Yes The computation is performed on several Intel Xeon Silver CPU cores and an additional RTX 4090 GPU processor. (Section C.2.1)
Software Dependencies No When solving the linear program (i.e., finding the coupling matrix Γ), we use the POT library (Flamary et al., 2021). For finding cluster centers, we adopt the scikit-learn library (Pedregosa et al., 2011) to run the K-means algorithm. (Section 5.1) This text mentions software libraries but does not provide specific version numbers for them.
Experiment Setup Yes The maximum number of iterations is set to 100, and we select the best iteration when Cost is minimized. (Section 5.1) The value of ε is sweeped in increments of 0.05, ranging from 0.1 to 0.9. (Section C.2.2) We set a learning rate of 0.005 for CENSUS dataset with L2 normalization, and 0.05 for all other cases. To accelerate convergence, 20 gradient steps of updating the centers are performed per iteration. (Section C.3.4)