reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CoinPress: Practical Private Mean and Covariance Estimation

Authors: Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods.
Researcher Affiliation	Collaboration	Sourav Biswas Cheriton School of Computer Science University of Waterloo EMAIL Yihe Dong Microsoft EMAIL Gautam Kamath Cheriton School of Computer Science University of Waterloo EMAIL Jonathan Ullman Khoury College of Computer Sciences Northeastern University EMAIL
Pseudocode	Yes	Figure 2: Mean Estimation Algorithms
Open Source Code	Yes	Full version of the paper and code are available in the supplement, or alternatively at the following URLs: https://arxiv.org/abs/2006.06618 and https://github.com/ twistedcubic/coin-press.
Open Datasets	Yes	We generate a dataset of n samples from a d-dimensional Gaussian N(0, I), where we are promised the mean is bounded in ℓ2-distance by R. We revisit the classic genes mirror geography discovery of Novembre et al. [30]. We obtained a 20-dimensional version of the dataset (n = 1387) from the authors Git Hub.8 https://github.com/Novembre Lab/Novembre_etal_2008_misc
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits. It evaluates parameter estimators on generated samples or a given dataset rather than training a supervised model with distinct splits.
Hardware Specification	Yes	most of the plots (each involving about 1,000 runs of our algorithm) took only a few minutes to generate on a laptop computer with an Intel Core i7-7700HQ CPU. All experiments were completed within a few minutes on a laptop computer with an Intel Core i7-7700HQ CPU.
Software Dependencies	No	The paper mentions using its own implementation and the code from [14] for SYMQ, but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	Our algorithm has essentially four hyperparameters: choice of t, splitting the privacy budget, radius of the clipping ball, and radius of the conﬁdence ball. We explore the role of t in our experiments. We found that assigning most of the privacy budget to the ﬁnal iteration increased performance, namely 3ρ/4 goes to the ﬁnal iteration and ρ/4(t 1) to each other. We run each method 100 times, and report the trimmed mean, with trimming parameter 0.1.