EL-Clustering: Combining Upper- and Lower-Bounded Clusterings for Equitable Load Constraints

Authors: Rajni Dabas, Neelima Gupta, Rudra Bhardwaj, Sapna Grover

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also do empirical evaluation for k-Median objective on benchmark datasets to show that both, the cost as well as the violation factor are significantly smaller in practice than the theoretical worst-case guarantees.
Researcher Affiliation Academia Rajni Dabas EMAIL Department of Computer Science, Northwestern University P.G.D.A.V. College, University of Delhi Neelima Gupta EMAIL Department of Computer Science, University of Delhi Rudra Bhardwaj EMAIL Department of Computer Science, University of Delhi Sapna Grover EMAIL Department of Computer Science, University of Delhi Dyal Singh College, University of Delhi
Pseudocode Yes Algorithm 1: Constructing Graph G2(σL, σU) Algorithm 2: Breaking Cycles: Constructing an almost-DAG G2(ˆσL, σU) Algorithm 3: Process(Si) Algorithm 4: Constructing SI
Open Source Code Yes All source code, logs, and charts are publicly available8. 8https://github.com/0-rudra-0/el-clustering
Open Datasets Yes We empirically evaluate our combination algorithm for the k-Median objective on benchmark datasets from the UCI Machine Learning Repository, as preprocessed in Almanza et al. (2022): Adult, Diabetes, and Bank. The Adult dataset Dua & Graff (2017) contains census records, the Bank dataset Moro et al. (2014) contains data points from a marketing campaign, and the Diabetes dataset Strack et al. (2014) contains admission records of diabetic patients.
Dataset Splits No The paper mentions varying clustering parameters (k, L, U) for experiments but does not provide specific training/test/validation dataset splits for the benchmark datasets used.
Hardware Specification Yes The algorithm is implemented in Python and executed on a Apple laptop with an Apple M1 processor, 8 GB RAM, running mac OS (Tahoe 26 beta).
Software Dependencies No The paper states the algorithm is implemented in Python but does not specify a version number for Python or any specific libraries/packages used.
Experiment Setup Yes We vary the number of clusters k for k {10, 25, 50, 75, 100}. The lower bound L and upper bound U are derived from the natural cluster size n/k, with buffers of {10%, 25%, 50%}. Facility pool F of varying sizes ({5%, 10%, 20%, 25%} of the input size) are selected randomly from the input points.