Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Locally Private k-Means Clustering
Authors: Uri Stemmer
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We design a new algorithm for the Euclidean k-means problem that operates in the local model of differential privacy. Our algorithm significantly reduces the additive error while keeping the multiplicative error the same as in previous state-of-the-art results. Specifically, on a database of size n, our algorithm guarantees O(1) multiplicative error and n1/2+a additive error for an arbitrarily small constant a > 0. All previous algorithms in the local model had additive error n2/3+a. Our techniques extend to k-median clustering. We show that the additive error we obtain is almost optimal in terms of its dependency on the database size n. Specifically, we give a simple lower bound showing that every locally-private algorithm for the k-means objective must have additive error at least n. |
| Researcher Affiliation | Collaboration | Uri Stemmer EMAIL Ben-Gurion University, Beer-Sheva, Israel Google Research, Tel Aviv, Israel |
| Pseudocode | Yes | Algorithm Weighted Centers Input: Failure probability β, privacy parameters ε, δ. Setting: Each player j [n] holds a value xj B(0, Λ). Define S = (x1, . . . , xn). 1. Constructing candidate centers: Denote d log(log n). For r = Λ, Λ 4 , . . . , Λ n, execute (in parallel) algorithm Good Centers on S with the parameter t and the radius r to obtain sets of centers Y r 1 , . . . , Y r M, lists Lr 1, . . . , Lr M, hash functions hr 1, . . . , hr M, and a partition Ir 1, . . . , Ir M [n]. Each execution of Good Centers is done with privacy parameters ε 4 log(n), δ log(n). Denote Y = S r,m Y r m. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials for the methodology described. |
| Open Datasets | No | The paper is theoretical and discusses |
| Dataset Splits | No | The paper focuses on theoretical algorithm design and analysis, rather than empirical evaluation on specific datasets. Therefore, it does not mention any dataset splits (e.g., training, validation, test sets). |
| Hardware Specification | No | The paper is theoretical, presenting new algorithms, error bounds, and lower bounds for k-means clustering in the local model of differential privacy. It does not describe any experimental setup that would require specific hardware, hence no hardware specifications are provided. |
| Software Dependencies | No | The paper focuses on theoretical algorithm design and analysis. It does not describe an implementation of the algorithms or mention any specific software libraries, frameworks, or their version numbers that would be necessary to replicate experiments. |
| Experiment Setup | No | The paper introduces theoretical algorithms, their error guarantees, and lower bounds. It does not include an experimental section or details on hyperparameters, training configurations, or system-level settings, as it does not report on empirical experiments. |