reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Approximation Algorithms for Stochastic Clustering

Authors: David G. Harris, Shi Li, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting. Additionally, they oﬀer a number of advantages including clustering which is fairer and has better long-term behavior for each user. In particular, they ensure that every user is guaranteed to get good service (on average). We also complement some of these with impossibility results. ... Keywords: clustering, k-center, k-median, lottery, approximation algorithms
Researcher Affiliation	Collaboration	David G. Harris EMAIL Department of Computer Science, University of Maryland College Park, MD 20742; Shi Li EMAIL University at Buﬀalo, Buﬀalo, NY; Thomas Pensyl EMAIL Bandwidth, Inc. Raleigh, NC; Aravind Srinivasan EMAIL Department of Computer Science and Institute for Advanced Computer Studies University of Maryland, College Park, MD 20742; Khoa Trinh EMAIL Google, Mountain View, CA 94043
Pseudocode	Yes	Algorithm 1 Greedy Cluster(F, w)... Algorithm 2 Rounding algorithm for half-homogeneous chance k-coverage... Algorithm 3 Iterative rounding algorithm for chance k-coverage... Algorithm 4 Rounding algorithm with clusters... Algorithm 5 Rounding algorithm for k-center... Algorithm 7 Partial-cluster based algorithm... Algorithm 8 Approximation algorithm for E[d(j, S)]: ﬁrst phase... Algorithm 9 (α, β)-determinization algorithm... Algorithm 10 (1, k + 2)-determinization algorithm
Open Source Code	No	We wrote C code to perform this computation to upper-bound EQ ˆR with M = 10, ϵ = 2 12, L = 7.
Open Datasets	No	The paper focuses on theoretical approximation algorithms for clustering problems and does not describe experiments run on specific datasets, therefore no concrete access information for open datasets is provided.
Dataset Splits	No	The paper presents theoretical approximation algorithms and does not include empirical experiments with datasets that would require training/test/validation splits.
Hardware Specification	No	We wrote C code to perform this computation to upper-bound EQ ˆR with M = 10, ϵ = 2 12, L = 7. This runs in about an hour on a single CPU core.
Software Dependencies	No	We wrote C code to perform this computation to upper-bound EQ ˆR with M = 10, ϵ = 2 12, L = 7.
Experiment Setup	No	The paper is theoretical, focusing on algorithm design and proofs, and does not contain details about specific experimental setups, hyperparameters, or training configurations.