Learning from End User Data with Shuffled Differential Privacy over Kernel Densities
Authors: Tal Wagner
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method with various combinations of kernels and bitsum protocols. Our code is enclosed in the supplementary material and available online. Datasets. We use three textual datasets and one image dataset: DBPedia-14 (Zhang et al., 2015): ... AG news (Zhang et al., 2015): ... SST2 (Socher et al., 2013): ... CIFAR-10 (Krizhevsky, 2009): ... Figure 1: Classification results with εlbl = 5 |
| Researcher Affiliation | Academia | Tal Wagner The Blavatnik School of Computer Science and AI Tel-Aviv University EMAIL. Author is also with Amazon. This work is not associated with Amazon. |
| Pseudocode | Yes | Algorithm 1: Shuffled DP KDE protocol from bitsums. Algorithm 2: Shuffled DP Gaussian KDE protocol, based on either RR or 3NB bitsum protocol. |
| Open Source Code | Yes | Our code is enclosed in the supplementary material and available online. |
| Open Datasets | Yes | Datasets. We use three textual datasets and one image dataset: DBPedia-14 (Zhang et al., 2015): ... AG news (Zhang et al., 2015): ... SST2 (Socher et al., 2013): ... CIFAR-10 (Krizhevsky, 2009): |
| Dataset Splits | Yes | DBPedia-14 (Zhang et al., 2015): Text documents containing summaries of Wikipedia articles. Training examples: 560K, test examples: 70K, classes: 14, task: topic classification. AG news (Zhang et al., 2015): Text documents containing news articles. Training examples: 120K, test examples: 7.6K, classes: 4, task: topic classification. SST2 (Socher et al., 2013): Sentences extracted from movie reviews. Training examples: 67.3K, test examples: 1.82K, classes: 2, task: sentiment classification (positive/negative). CIFAR-10 (Krizhevsky, 2009): Images from different object categories. Training examples: 50K, test examples: 10K, classes: 10, task: depicted object classification. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | Yes | The textual datasets are embedded into 768 dimensions with the Sentence BERT all-mpnet-base-v2 model (Reimers & Gurevych, 2019). CIFAR-10 is embedded into 6144 dimensions with the Sim CLR r152 3x sk1 model (Chen et al., 2020b). |
| Experiment Setup | Yes | We use ε (0, 10) to protect the training point x Rd with (ε, δ)-shuffle DP, and εlbl {3, 5, 7, 10} to protect the label c [m] with (εlbl, 0)-local DP. We use δ = 10 6 for DBPedia-14 and AG news, and δ = 10 5 for SST2 and CIFAR-10, accounting for the different dataset sizes. For RR and 3NB, the δ budget in Theorem 3.2 is split equally between the advanced composition parameter δ and the total IQδ0 term of the bitsum protocol instances. To equalize the computational costs of the two kernels, we set the number of repetitions I in Algorithm 1 to d (the embedding dimension). |