FairDen: Fair Density-Based Clustering

Authors: Lena Krieger, Anna Beer, Pernille Matthews, Anneka Thiesson, Ira Assent

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that Fair Den finds meaningful and fair clusters in extensive experiments. 3 EXPERIMENTAL EVALUATION We measure cluster quality with DCSI (Gauss et al., 2024) and group-level fairness with generalized balance Bera et al. (2019) (Sect. 3.2,3.1). We study real-world benchmarks (Sect. 3.3) and compare to state-of-the-art fair clustering methods Fair SC (Kleindessner et al., 2019b), normalized Fair SC (Kleindessner et al., 2019b), Fairlets (Chierichetti et al., 2017), and Scalable Fair Clustering (Backurs et al., 2019) (Section 3.4).
Researcher Affiliation Academia 1 IAS-8: Data Analytics and Machine Learning, Forschungszentrum J ulich, J ulich, Germany 2 Faculty of Computer Science, University of Vienna, Vienna, Austria 3 Department of Computer Science, Aarhus University, Aarhus, Denmark
Pseudocode Yes A.1 PSEUDO-CODE The pseudo-code for our novel fair density-based clustering method Fair Den is given in Algorithm 1: Algorithm 1 Fair Den
Open Source Code Yes Our code is available at Git Hub1. 1https://jugit.fz-juelich.de/ias-8/fairden
Open Datasets Yes We use the common benchmark datasets for fair clustering (Chhabra et al., 2021; Le Quy et al., 2022), details shown in Table 6: The datasets Adult (Kohavi et al., 1996), Bank (Moro et al., 2014), Communities and Crime (Asuncion & Newman, 2007), and Diabetes (Strack et al., 2014) provide different scenarios in terms of dimensionality and number of sensitive groups.
Dataset Splits No The paper describes sampling points from datasets (e.g., "We sampled 2000 data points from the dataset" for Adult, "We sample the dataset to 5000 data points" for Bank and Diabetes), and for runtime experiments, it mentions generating datasets with DENSIRED and randomly assigning a binary sensitive attribute. However, it does not specify explicit training/test/validation splits, percentages, or predefined partitions for the benchmark datasets used in the main evaluation.
Hardware Specification Yes The experiments are performed on a Mac Book Pro, with an M2 Pro, and 16 GB of RAM using Python 3.9. The runtime experiments are performed on a workstation with an AMD Ryzen Threadripper PRO 3955W, 250 GB RAM, and an RTX 3090.
Software Dependencies No The experiments are performed on a Mac Book Pro, with an M2 Pro, and 16 GB of RAM using Python 3.9. While Python 3.9 is mentioned, no specific version numbers for libraries or other key software components are provided to ensure full reproducibility.
Experiment Setup Yes Following Schubert et al. (2017), we fix the parameter µ = 2d 1 for Fair Den and show an ablation in App. A.2. The parameters for DBSCAN, min Pts, and ε, were determined with a hyperparameter optimization. The criterion for the optimization is the DCSI score with a constant setting of min Pts DCSI = 5. The parameter space for min Pts DBSCAN comprises {4, 5, 10, 15, 2d 1}, with d being the dimension of the dataset, and for ε {0.01, 0.05, 0.1, .., 0.5, 0.6, .., 2.5, 2.6, 2.8, 3, 3.25, 3.5, 3.75}. The final parameter settings are given in Table 6.