Approximation Algorithms for Stochastic Clustering

Authors: David G. Harris, Shi Li, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

JMLR 2019 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including clustering which is fairer and has better long-term behavior for each user. In particular, they ensure that every user is guaranteed to get good service (on average). We also complement some of these with impossibility results. ... Keywords: clustering, k-center, k-median, lottery, approximation algorithms
Researcher Affiliation Collaboration David G. Harris EMAIL Department of Computer Science, University of Maryland College Park, MD 20742; Shi Li EMAIL University at Buffalo, Buffalo, NY; Thomas Pensyl EMAIL Bandwidth, Inc. Raleigh, NC; Aravind Srinivasan EMAIL Department of Computer Science and Institute for Advanced Computer Studies University of Maryland, College Park, MD 20742; Khoa Trinh EMAIL Google, Mountain View, CA 94043
Pseudocode Yes Algorithm 1 Greedy Cluster(F, w)... Algorithm 2 Rounding algorithm for half-homogeneous chance k-coverage... Algorithm 3 Iterative rounding algorithm for chance k-coverage... Algorithm 4 Rounding algorithm with clusters... Algorithm 5 Rounding algorithm for k-center... Algorithm 7 Partial-cluster based algorithm... Algorithm 8 Approximation algorithm for E[d(j, S)]: first phase... Algorithm 9 (α, β)-determinization algorithm... Algorithm 10 (1, k + 2)-determinization algorithm
Open Source Code No We wrote C code to perform this computation to upper-bound EQ ˆR with M = 10, ϵ = 2 12, L = 7.
Open Datasets No The paper focuses on theoretical approximation algorithms for clustering problems and does not describe experiments run on specific datasets, therefore no concrete access information for open datasets is provided.
Dataset Splits No The paper presents theoretical approximation algorithms and does not include empirical experiments with datasets that would require training/test/validation splits.
Hardware Specification No We wrote C code to perform this computation to upper-bound EQ ˆR with M = 10, ϵ = 2 12, L = 7. This runs in about an hour on a single CPU core.
Software Dependencies No We wrote C code to perform this computation to upper-bound EQ ˆR with M = 10, ϵ = 2 12, L = 7.
Experiment Setup No The paper is theoretical, focusing on algorithm design and proofs, and does not contain details about specific experimental setups, hyperparameters, or training configurations.