Fair Clustering in the Sliding Window Model
Authors: Vincent Cohen-Addad, Shaofeng Jiang, Qiaoyuan Yang, Yubo Zhang, Samson Zhou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also implement a number of empirical evaluations on real datasets to complement our theoretical results. |
| Researcher Affiliation | Collaboration | Vincent Cohen-Addad Google EMAIL Shaofeng H.-C. Jiang School of Computer Science, Peking University EMAIL Qiaoyuan Yang School of Computer Science, Peking University EMAIL Yubo Zhang School of Computer Science, Peking University EMAIL Samson Zhou Texas A&M University EMAIL |
| Pseudocode | Yes | Algorithm 1 Online assignment-preserving coreset construction procedure ONLINECORESET(P, ε, δ) ... Algorithm 2 Coreset for a single ring procedure RINGCORESET(P, ε, δ) ... Algorithm 3 Sliding window coreset algorithm based on online assignment-preserving coreset procedure MERGEANDREDUCE(P) |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We evaluate the algorithms on 5 real datasets: Adult (Becker & Kohavi, 1996), Bank (Moro S & P, 2014), Diabetes (Kahn), Athlete (Barshan & Altun, 2010), and Census (Meek et al., 2001), which have also been used in various previous studies on (fair) clustering (Bera et al., 2019a; Chierichetti et al., 2017; Schmidt et al., 2018; Huang et al., 2019). |
| Dataset Splits | No | The paper mentions the window size and target coreset sizes for each dataset, but it does not specify explicit training/test/validation splits (e.g., percentages or counts) or cite predefined splits. |
| Hardware Specification | Yes | All the experiments are run on a Mac Book Air 15.3 with an Apple M3 chip (8 cores, 2.22 GHz), 16GB RAM, and mac OS 14.4.1 (23E224). |
| Software Dependencies | No | The paper mentions using "Fairtree (Backurs et al., 2019)" but does not provide specific version numbers for any software, libraries, or programming languages used in the implementation. |
| Experiment Setup | Yes | We choose k = 10 in all experiments. When implementing our coreset, we directly specify a target coreset size instead of using the worst-case bound as we established in previous sections. Due to variations in dataset sizes and corresponding window sizes, we assigned different coreset sizes for each dataset. We set this target size 150 for both Adult and Bank, 300 for Diabetes, 750 for Athlete, and 1500 for Census. |