Learning from End User Data with Shuffled Differential Privacy over Kernel Densities

Authors: Tal Wagner

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method with various combinations of kernels and bitsum protocols. Our code is enclosed in the supplementary material and available online. Datasets. We use three textual datasets and one image dataset: DBPedia-14 (Zhang et al., 2015): ... AG news (Zhang et al., 2015): ... SST2 (Socher et al., 2013): ... CIFAR-10 (Krizhevsky, 2009): ... Figure 1: Classification results with εlbl = 5
Researcher Affiliation Academia Tal Wagner The Blavatnik School of Computer Science and AI Tel-Aviv University EMAIL. Author is also with Amazon. This work is not associated with Amazon.
Pseudocode Yes Algorithm 1: Shuffled DP KDE protocol from bitsums. Algorithm 2: Shuffled DP Gaussian KDE protocol, based on either RR or 3NB bitsum protocol.
Open Source Code Yes Our code is enclosed in the supplementary material and available online.
Open Datasets Yes Datasets. We use three textual datasets and one image dataset: DBPedia-14 (Zhang et al., 2015): ... AG news (Zhang et al., 2015): ... SST2 (Socher et al., 2013): ... CIFAR-10 (Krizhevsky, 2009):
Dataset Splits Yes DBPedia-14 (Zhang et al., 2015): Text documents containing summaries of Wikipedia articles. Training examples: 560K, test examples: 70K, classes: 14, task: topic classification. AG news (Zhang et al., 2015): Text documents containing news articles. Training examples: 120K, test examples: 7.6K, classes: 4, task: topic classification. SST2 (Socher et al., 2013): Sentences extracted from movie reviews. Training examples: 67.3K, test examples: 1.82K, classes: 2, task: sentiment classification (positive/negative). CIFAR-10 (Krizhevsky, 2009): Images from different object categories. Training examples: 50K, test examples: 10K, classes: 10, task: depicted object classification.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies Yes The textual datasets are embedded into 768 dimensions with the Sentence BERT all-mpnet-base-v2 model (Reimers & Gurevych, 2019). CIFAR-10 is embedded into 6144 dimensions with the Sim CLR r152 3x sk1 model (Chen et al., 2020b).
Experiment Setup Yes We use ε (0, 10) to protect the training point x Rd with (ε, δ)-shuffle DP, and εlbl {3, 5, 7, 10} to protect the label c [m] with (εlbl, 0)-local DP. We use δ = 10 6 for DBPedia-14 and AG news, and δ = 10 5 for SST2 and CIFAR-10, accounting for the different dataset sizes. For RR and 3NB, the δ budget in Theorem 3.2 is split equally between the advanced composition parameter δ and the total IQδ0 term of the bitsum protocol instances. To equalize the computational costs of the two kernels, we set the number of repetitions I in Algorithm 1 to d (the embedding dimension).