Fair Clustering in the Sliding Window Model

Authors: Vincent Cohen-Addad, Shaofeng Jiang, Qiaoyuan Yang, Yubo Zhang, Samson Zhou

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also implement a number of empirical evaluations on real datasets to complement our theoretical results.
Researcher Affiliation Collaboration Vincent Cohen-Addad Google EMAIL Shaofeng H.-C. Jiang School of Computer Science, Peking University EMAIL Qiaoyuan Yang School of Computer Science, Peking University EMAIL Yubo Zhang School of Computer Science, Peking University EMAIL Samson Zhou Texas A&M University EMAIL
Pseudocode Yes Algorithm 1 Online assignment-preserving coreset construction procedure ONLINECORESET(P, ε, δ) ... Algorithm 2 Coreset for a single ring procedure RINGCORESET(P, ε, δ) ... Algorithm 3 Sliding window coreset algorithm based on online assignment-preserving coreset procedure MERGEANDREDUCE(P)
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We evaluate the algorithms on 5 real datasets: Adult (Becker & Kohavi, 1996), Bank (Moro S & P, 2014), Diabetes (Kahn), Athlete (Barshan & Altun, 2010), and Census (Meek et al., 2001), which have also been used in various previous studies on (fair) clustering (Bera et al., 2019a; Chierichetti et al., 2017; Schmidt et al., 2018; Huang et al., 2019).
Dataset Splits No The paper mentions the window size and target coreset sizes for each dataset, but it does not specify explicit training/test/validation splits (e.g., percentages or counts) or cite predefined splits.
Hardware Specification Yes All the experiments are run on a Mac Book Air 15.3 with an Apple M3 chip (8 cores, 2.22 GHz), 16GB RAM, and mac OS 14.4.1 (23E224).
Software Dependencies No The paper mentions using "Fairtree (Backurs et al., 2019)" but does not provide specific version numbers for any software, libraries, or programming languages used in the implementation.
Experiment Setup Yes We choose k = 10 in all experiments. When implementing our coreset, we directly specify a target coreset size instead of using the worst-case bound as we established in previous sections. Due to variations in dataset sizes and corresponding window sizes, we assigned different coreset sizes for each dataset. We set this target size 150 for both Adult and Bank, 300 for Diabetes, 750 for Athlete, and 1500 for Census.