Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
CoinPress: Practical Private Mean and Covariance Estimation
Authors: Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods. |
| Researcher Affiliation | Collaboration | Sourav Biswas Cheriton School of Computer Science University of Waterloo EMAIL Yihe Dong Microsoft EMAIL Gautam Kamath Cheriton School of Computer Science University of Waterloo EMAIL Jonathan Ullman Khoury College of Computer Sciences Northeastern University EMAIL |
| Pseudocode | Yes | Figure 2: Mean Estimation Algorithms |
| Open Source Code | Yes | Full version of the paper and code are available in the supplement, or alternatively at the following URLs: https://arxiv.org/abs/2006.06618 and https://github.com/ twistedcubic/coin-press. |
| Open Datasets | Yes | We generate a dataset of n samples from a d-dimensional Gaussian N(0, I), where we are promised the mean is bounded in ℓ2-distance by R. We revisit the classic genes mirror geography discovery of Novembre et al. [30]. We obtained a 20-dimensional version of the dataset (n = 1387) from the authors Git Hub.8 https://github.com/Novembre Lab/Novembre_etal_2008_misc |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. It evaluates parameter estimators on generated samples or a given dataset rather than training a supervised model with distinct splits. |
| Hardware Specification | Yes | most of the plots (each involving about 1,000 runs of our algorithm) took only a few minutes to generate on a laptop computer with an Intel Core i7-7700HQ CPU. All experiments were completed within a few minutes on a laptop computer with an Intel Core i7-7700HQ CPU. |
| Software Dependencies | No | The paper mentions using its own implementation and the code from [14] for SYMQ, but it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Our algorithm has essentially four hyperparameters: choice of t, splitting the privacy budget, radius of the clipping ball, and radius of the confidence ball. We explore the role of t in our experiments. We found that assigning most of the privacy budget to the final iteration increased performance, namely 3ρ/4 goes to the final iteration and ρ/4(t 1) to each other. We run each method 100 times, and report the trimmed mean, with trimming parameter 0.1. |