Practical Locally Private Heavy Hitters
Authors: Raef Bassily, Kobbi Nissim, Uri Stemmer, Abhradeep Thakurta
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented Algorithm Tree Hist to verify our theoretical analysis and compared its performance with the performance of Google s RAPPOR code. |
| Researcher Affiliation | Academia | Raef Bassily EMAIL Department of Computer Science and Engineering The Ohio State University Kobbi Nissim EMAIL Department of Computer Science Georgetown University Uri Stemmer EMAIL Department of Computer Science Ben-Gurion University Abhradeep Thakurta EMAIL Department of Computer Science University of California Santa Cruz |
| Pseudocode | Yes | Algorithm 1 Local Rnd Input: User i input: vi V, Flag: Final {0, 1}. ... Algorithm 2 Freq Oracle Input: Prefix length: ℓ [log d], a subset of ℓ-bit prefixes b V {0, 1}ℓ, collection of t disjoint subsets of users: Ij : j [t] , oracle access to items of relevant users: vi : i j [t] Ij , scaling factor: γ, Flag: Final {0, 1}. ... Algorithm 3 Tree Hist The Full Protocol Input: Oracle access to users items (vi V : i [n]). ... Algorithm 4 R: Basic Randomizer Inputs: x { 1}, and privacy parameter ϵ. ... Algorithm 5 Explicit Hist Public randomness: Uniformly random matrix Z { 1}d n. ... Algorithm 6 Succinct Hist Public randomness: Random hash function h : V [T]. Random partition of [n] into log d subsets I1, , Ilog d. ... Algorithm 7 Hashtogram Public randomness: Random partition of [n] into R subsets I1, , IR. Random hash functions h1, , h R mapping V to [T]. Uniformly random matrix Z { 1}T n. ... Algorithm 8 Bitstogram Tool used: Binary code (Enc, Dec), where Enc : V V , correcting 1 8-fraction of errors with constant rate, that is log d = O(log d), where d = |V| and d = |V |. |
| Open Source Code | No | Here we compare ourselves to the only other system (RAPPOR project from GOOGLE) for the private frequency estimation problem whose code is publicly available. We took the snapshot of their code base (https://github.com/google/rappor) on May 9th, 2017. |
| Open Datasets | Yes | We also ran the same experiment (Figure 3) on a real data set drawn uniformly at random the NLTK Brown corpus (NLT). The data set we created has n = 10 million samples drawn i.i.d. from the corpus with replacement, and the system parameters are the same default parameters described earlier. |
| Dataset Splits | No | The data set we created has n = 10 million samples drawn i.i.d. from the corpus with replacement, and the system parameters are the same default parameters described earlier. |
| Hardware Specification | Yes | Our experiments are performed on a mac OS-Sierra 10.12 system (in Python 2.7) with 3.3Ghz (Intel Core i5) and 16GB of DDR-3 RAM. |
| Software Dependencies | No | Our experiments are performed on a mac OS-Sierra 10.12 system (in Python 2.7) with 3.3Ghz (Intel Core i5) and 16GB of DDR-3 RAM. |
| Experiment Setup | Yes | The default parameters used in Figure 1 are: number of data samples (n) : 10 million, range of the hash function (m): n, number of hash functions (t): 285, and the privacy parameter ϵ = 2.0. ... We set a threhold of 15 n as the threshold for being a heavy hitter. |