Fair Clustering via Alignment
Authors: Kunwoong Kim, Jihu Lee, Sangchul Park, Yongdai Kim
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that FCA outperforms existing methods by (i) attaining a superior trade-off between fairness level and clustering utility, and (ii) achieving near-perfect fairness without numerical instability. (Section 1: Introduction) |
| Researcher Affiliation | Academia | 1Department of Statistics, Seoul National University, Republic of Korea 2School of Law, Seoul National University, Republic of Korea. |
| Pseudocode | Yes | Algorithm 1 FCA algorithm (Section 4.1) and Algorithm 2 FCA-C algorithm (Section 4.2). |
| Open Source Code | Yes | Implementation code is available at https://github.com/kwkimonline/FCA. |
| Open Datasets | Yes | We use three benchmark tabular datasets, ADULT (Becker & Kohavi, 1996), BANK (Moro et al., 2012), and CENSUS (Meek et al.), from the UCI Machine Learning Repository2 (Dua & Graff, 2017). |
| Dataset Splits | No | The number of clusters K is set to 10 for ADULT and BANK, and 20 for CENSUS, following Ziko et al. (2021). We subsample 20,000 instances for CENSUS. The paper mentions these details and sensitive attribute distributions, but it does not specify explicit training, validation, or test splits for the datasets. |
| Hardware Specification | Yes | The computation is performed on several Intel Xeon Silver CPU cores and an additional RTX 4090 GPU processor. (Section C.2.1) |
| Software Dependencies | No | When solving the linear program (i.e., finding the coupling matrix Γ), we use the POT library (Flamary et al., 2021). For finding cluster centers, we adopt the scikit-learn library (Pedregosa et al., 2011) to run the K-means algorithm. (Section 5.1) This text mentions software libraries but does not provide specific version numbers for them. |
| Experiment Setup | Yes | The maximum number of iterations is set to 100, and we select the best iteration when Cost is minimized. (Section 5.1) The value of ε is sweeped in increments of 0.05, ranging from 0.1 to 0.9. (Section C.2.2) We set a learning rate of 0.005 for CENSUS dataset with L2 normalization, and 0.05 for all other cases. To accelerate convergence, 20 gradient steps of updating the centers are performed per iteration. (Section C.3.4) |