Relative Error Fair Clustering in the Weak-Strong Oracle Model
Authors: Vladimir Braverman, Prathamesh Dharangutte, Shaofeng H.-C. Jiang, Hoai-An Nguyen, Chen Wang, Yubo Zhang, Samson Zhou
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement and evaluate our algorithm on two real-world datasets for fair k-median. We use Adult which has about 50, 000 instances and 8 features (Becker & Kohavi, 1996). The second dataset is Default of Credit Card Clients with about 30, 000 instances and 9 features (Yeh, 2009). For both we make the sensitive attribute gender. ... We compare our algorithm against a baseline algorithm that uniformly at random samples and re-weights points. The experiments show that the proposed algorithm in Theorem 1 could significantly outperform the uniform baseline. We defer the experiments to Section 5. |
| Researcher Affiliation | Collaboration | 1Google Research 2Department of Computer Science, Rutgers University 3School of Computer Science, Peking University 4Computer Science Department, Carnegie Mellon University 5Department of Computer Science, Rice University 6Department of Computer Science & Engineering, Texas A&M University. |
| Pseudocode | Yes | Algorithm 1. The construction algorithm for (1 + ε) coreset ... Algorithm 2 (CORESET-UPDATE). An algorithm that adds points to coreset. ... Algorithm 3 (PEELING). An algorithm that removes points. ... Algorithm 4 (CONSERVATIVE-PEELING). An algorithm that removes points. ... Algorithm 5. The construction algorithm for (1 + ε) coreset ... Algorithm 6. A charging scheme (a thought process for the analysis purpose only). ... Algorithm 7. A charging scheme (a thought process for the analysis purpose only). |
| Open Source Code | No | All experiments were run locally on a Macbook Air. The code can be found here. |
| Open Datasets | Yes | We use Adult which has about 50, 000 instances and 8 features (Becker & Kohavi, 1996). The second dataset is Default of Credit Card Clients with about 30, 000 instances and 9 features (Yeh, 2009). |
| Dataset Splits | No | Due to computational limits, we first subsampled the dataset to size roughly 2000 using Meyerson sampling (Meyerson, 2001a). ... After subsampling the original dataset using Meyerson sampling to a smaller dataset, we ran our fair coreset algorithm on this smaller dataset to create a coreset of size 100. We also made a uniform coreset as a baseline which simply uniformly selected 100 points. |
| Hardware Specification | Yes | All experiments were run locally on a Macbook Air. |
| Software Dependencies | No | We use the fairtree algorithm (Backurs et al., 2019), which we run on top of the uniform and our coreset. ... Due to computational limits, we first subsampled the dataset to size roughly 2000 using Meyerson sampling (Meyerson, 2001a). |
| Experiment Setup | Yes | For all experiments we use the values of balance parameters p = 1 and q = 10 (refer to Backurs et al. (2019) for more details) and we report the cost averaged over 10 independent runs. |