FairDen: Fair Density-Based Clustering
Authors: Lena Krieger, Anna Beer, Pernille Matthews, Anneka Thiesson, Ira Assent
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Fair Den finds meaningful and fair clusters in extensive experiments. 3 EXPERIMENTAL EVALUATION We measure cluster quality with DCSI (Gauss et al., 2024) and group-level fairness with generalized balance Bera et al. (2019) (Sect. 3.2,3.1). We study real-world benchmarks (Sect. 3.3) and compare to state-of-the-art fair clustering methods Fair SC (Kleindessner et al., 2019b), normalized Fair SC (Kleindessner et al., 2019b), Fairlets (Chierichetti et al., 2017), and Scalable Fair Clustering (Backurs et al., 2019) (Section 3.4). |
| Researcher Affiliation | Academia | 1 IAS-8: Data Analytics and Machine Learning, Forschungszentrum J ulich, J ulich, Germany 2 Faculty of Computer Science, University of Vienna, Vienna, Austria 3 Department of Computer Science, Aarhus University, Aarhus, Denmark |
| Pseudocode | Yes | A.1 PSEUDO-CODE The pseudo-code for our novel fair density-based clustering method Fair Den is given in Algorithm 1: Algorithm 1 Fair Den |
| Open Source Code | Yes | Our code is available at Git Hub1. 1https://jugit.fz-juelich.de/ias-8/fairden |
| Open Datasets | Yes | We use the common benchmark datasets for fair clustering (Chhabra et al., 2021; Le Quy et al., 2022), details shown in Table 6: The datasets Adult (Kohavi et al., 1996), Bank (Moro et al., 2014), Communities and Crime (Asuncion & Newman, 2007), and Diabetes (Strack et al., 2014) provide different scenarios in terms of dimensionality and number of sensitive groups. |
| Dataset Splits | No | The paper describes sampling points from datasets (e.g., "We sampled 2000 data points from the dataset" for Adult, "We sample the dataset to 5000 data points" for Bank and Diabetes), and for runtime experiments, it mentions generating datasets with DENSIRED and randomly assigning a binary sensitive attribute. However, it does not specify explicit training/test/validation splits, percentages, or predefined partitions for the benchmark datasets used in the main evaluation. |
| Hardware Specification | Yes | The experiments are performed on a Mac Book Pro, with an M2 Pro, and 16 GB of RAM using Python 3.9. The runtime experiments are performed on a workstation with an AMD Ryzen Threadripper PRO 3955W, 250 GB RAM, and an RTX 3090. |
| Software Dependencies | No | The experiments are performed on a Mac Book Pro, with an M2 Pro, and 16 GB of RAM using Python 3.9. While Python 3.9 is mentioned, no specific version numbers for libraries or other key software components are provided to ensure full reproducibility. |
| Experiment Setup | Yes | Following Schubert et al. (2017), we fix the parameter µ = 2d 1 for Fair Den and show an ablation in App. A.2. The parameters for DBSCAN, min Pts, and ε, were determined with a hyperparameter optimization. The criterion for the optimization is the DCSI score with a constant setting of min Pts DCSI = 5. The parameter space for min Pts DBSCAN comprises {4, 5, 10, 15, 2d 1}, with d being the dimension of the dataset, and for ε {0.01, 0.05, 0.1, .., 0.5, 0.6, .., 2.5, 2.6, 2.8, 3, 3.25, 3.5, 3.75}. The final parameter settings are given in Table 6. |