Differentially Private n-gram Extraction
Authors: Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, Sergey Yekhanin
NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate the performance of our algorithms on two datasets: Reddit and MSNBC. |
| Researcher Affiliation | Industry | Kunho Kim Microsoft EMAIL Sivakanth Gopi Microsoft Research EMAIL Janardhan Kulkarni Microsoft Research EMAIL Sergey Yekhanin Microsoft Research EMAIL |
| Pseudocode | Yes | In this section we describe our algorithm for DPNE. The pseudocode is presented in Algorithm 1. |
| Open Source Code | Yes | Code available at https://github.com/microsoft/differentially-private-ngram-extraction |
| Open Datasets | Yes | The Reddit data set is a natural language dataset used extensively in NLP applications, and is taken from Tensor Flow repository.5 The MSNBC dataset consists page visits of users who browsed msnbc.com on September 28, 1999, and is recorded at the level of URL and ordered by time.6 |
| Dataset Splits | No | The paper uses Reddit and MSNBC datasets but does not specify how these datasets were split into training, validation, or test sets for their experiments. No explicit percentages, counts, or references to standard splits are provided for reproducibility. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with their version numbers, such as programming languages, libraries, or frameworks used for implementation or experimentation. |
| Experiment Setup | Yes | Throughout this section we fix T = 9, ε = 4, δ = 10 7, 1 = = 9 = 0 = 300, η = 0.01 unless otherwise specified. |