reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EL-Clustering: Combining Upper- and Lower-Bounded Clusterings for Equitable Load Constraints

Authors: Rajni Dabas, Neelima Gupta, Rudra Bhardwaj, Sapna Grover

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also do empirical evaluation for k-Median objective on benchmark datasets to show that both, the cost as well as the violation factor are significantly smaller in practice than the theoretical worst-case guarantees.
Researcher Affiliation	Academia	Rajni Dabas EMAIL Department of Computer Science, Northwestern University P.G.D.A.V. College, University of Delhi Neelima Gupta EMAIL Department of Computer Science, University of Delhi Rudra Bhardwaj EMAIL Department of Computer Science, University of Delhi Sapna Grover EMAIL Department of Computer Science, University of Delhi Dyal Singh College, University of Delhi
Pseudocode	Yes	Algorithm 1: Constructing Graph G2(σL, σU) Algorithm 2: Breaking Cycles: Constructing an almost-DAG G2(ˆσL, σU) Algorithm 3: Process(Si) Algorithm 4: Constructing SI
Open Source Code	Yes	All source code, logs, and charts are publicly available8. 8https://github.com/0-rudra-0/el-clustering
Open Datasets	Yes	We empirically evaluate our combination algorithm for the k-Median objective on benchmark datasets from the UCI Machine Learning Repository, as preprocessed in Almanza et al. (2022): Adult, Diabetes, and Bank. The Adult dataset Dua & Graff (2017) contains census records, the Bank dataset Moro et al. (2014) contains data points from a marketing campaign, and the Diabetes dataset Strack et al. (2014) contains admission records of diabetic patients.
Dataset Splits	No	The paper mentions varying clustering parameters (k, L, U) for experiments but does not provide specific training/test/validation dataset splits for the benchmark datasets used.
Hardware Specification	Yes	The algorithm is implemented in Python and executed on a Apple laptop with an Apple M1 processor, 8 GB RAM, running mac OS (Tahoe 26 beta).
Software Dependencies	No	The paper states the algorithm is implemented in Python but does not specify a version number for Python or any specific libraries/packages used.
Experiment Setup	Yes	We vary the number of clusters k for k {10, 25, 50, 75, 100}. The lower bound L and upper bound U are derived from the natural cluster size n/k, with buffers of {10%, 25%, 50%}. Facility pool F of varying sizes ({5%, 10%, 20%, 25%} of the input size) are selected randomly from the input points.