reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Practical Locally Private Heavy Hitters

Authors: Raef Bassily, Kobbi Nissim, Uri Stemmer, Abhradeep Thakurta

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented Algorithm Tree Hist to verify our theoretical analysis and compared its performance with the performance of Google s RAPPOR code.
Researcher Affiliation	Academia	Raef Bassily EMAIL Department of Computer Science and Engineering The Ohio State University Kobbi Nissim EMAIL Department of Computer Science Georgetown University Uri Stemmer EMAIL Department of Computer Science Ben-Gurion University Abhradeep Thakurta EMAIL Department of Computer Science University of California Santa Cruz
Pseudocode	Yes	Algorithm 1 Local Rnd Input: User i input: vi V, Flag: Final {0, 1}. ... Algorithm 2 Freq Oracle Input: Preﬁx length: ℓ [log d], a subset of ℓ-bit preﬁxes b V {0, 1}ℓ, collection of t disjoint subsets of users: Ij : j [t] , oracle access to items of relevant users: vi : i j [t] Ij , scaling factor: γ, Flag: Final {0, 1}. ... Algorithm 3 Tree Hist The Full Protocol Input: Oracle access to users items (vi V : i [n]). ... Algorithm 4 R: Basic Randomizer Inputs: x { 1}, and privacy parameter ϵ. ... Algorithm 5 Explicit Hist Public randomness: Uniformly random matrix Z { 1}d n. ... Algorithm 6 Succinct Hist Public randomness: Random hash function h : V [T]. Random partition of [n] into log d subsets I1, , Ilog d. ... Algorithm 7 Hashtogram Public randomness: Random partition of [n] into R subsets I1, , IR. Random hash functions h1, , h R mapping V to [T]. Uniformly random matrix Z { 1}T n. ... Algorithm 8 Bitstogram Tool used: Binary code (Enc, Dec), where Enc : V V , correcting 1 8-fraction of errors with constant rate, that is log d = O(log d), where d = \|V\| and d = \|V \|.
Open Source Code	No	Here we compare ourselves to the only other system (RAPPOR project from GOOGLE) for the private frequency estimation problem whose code is publicly available. We took the snapshot of their code base (https://github.com/google/rappor) on May 9th, 2017.
Open Datasets	Yes	We also ran the same experiment (Figure 3) on a real data set drawn uniformly at random the NLTK Brown corpus (NLT). The data set we created has n = 10 million samples drawn i.i.d. from the corpus with replacement, and the system parameters are the same default parameters described earlier.
Dataset Splits	No	The data set we created has n = 10 million samples drawn i.i.d. from the corpus with replacement, and the system parameters are the same default parameters described earlier.
Hardware Specification	Yes	Our experiments are performed on a mac OS-Sierra 10.12 system (in Python 2.7) with 3.3Ghz (Intel Core i5) and 16GB of DDR-3 RAM.
Software Dependencies	No	Our experiments are performed on a mac OS-Sierra 10.12 system (in Python 2.7) with 3.3Ghz (Intel Core i5) and 16GB of DDR-3 RAM.
Experiment Setup	Yes	The default parameters used in Figure 1 are: number of data samples (n) : 10 million, range of the hash function (m): n, number of hash functions (t): 285, and the privacy parameter ϵ = 2.0. ... We set a threhold of 15 n as the threshold for being a heavy hitter.