reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning from End User Data with Shuffled Differential Privacy over Kernel Densities

Authors: Tal Wagner

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method with various combinations of kernels and bitsum protocols. Our code is enclosed in the supplementary material and available online. Datasets. We use three textual datasets and one image dataset: DBPedia-14 (Zhang et al., 2015): ... AG news (Zhang et al., 2015): ... SST2 (Socher et al., 2013): ... CIFAR-10 (Krizhevsky, 2009): ... Figure 1: Classification results with εlbl = 5
Researcher Affiliation	Academia	Tal Wagner The Blavatnik School of Computer Science and AI Tel-Aviv University EMAIL. Author is also with Amazon. This work is not associated with Amazon.
Pseudocode	Yes	Algorithm 1: Shuffled DP KDE protocol from bitsums. Algorithm 2: Shuffled DP Gaussian KDE protocol, based on either RR or 3NB bitsum protocol.
Open Source Code	Yes	Our code is enclosed in the supplementary material and available online.
Open Datasets	Yes	Datasets. We use three textual datasets and one image dataset: DBPedia-14 (Zhang et al., 2015): ... AG news (Zhang et al., 2015): ... SST2 (Socher et al., 2013): ... CIFAR-10 (Krizhevsky, 2009):
Dataset Splits	Yes	DBPedia-14 (Zhang et al., 2015): Text documents containing summaries of Wikipedia articles. Training examples: 560K, test examples: 70K, classes: 14, task: topic classification. AG news (Zhang et al., 2015): Text documents containing news articles. Training examples: 120K, test examples: 7.6K, classes: 4, task: topic classification. SST2 (Socher et al., 2013): Sentences extracted from movie reviews. Training examples: 67.3K, test examples: 1.82K, classes: 2, task: sentiment classification (positive/negative). CIFAR-10 (Krizhevsky, 2009): Images from different object categories. Training examples: 50K, test examples: 10K, classes: 10, task: depicted object classification.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies	Yes	The textual datasets are embedded into 768 dimensions with the Sentence BERT all-mpnet-base-v2 model (Reimers & Gurevych, 2019). CIFAR-10 is embedded into 6144 dimensions with the Sim CLR r152 3x sk1 model (Chen et al., 2020b).
Experiment Setup	Yes	We use ε (0, 10) to protect the training point x Rd with (ε, δ)-shuffle DP, and εlbl {3, 5, 7, 10} to protect the label c [m] with (εlbl, 0)-local DP. We use δ = 10 6 for DBPedia-14 and AG news, and δ = 10 5 for SST2 and CIFAR-10, accounting for the different dataset sizes. For RR and 3NB, the δ budget in Theorem 3.2 is split equally between the advanced composition parameter δ and the total IQδ0 term of the bitsum protocol instances. To equalize the computational costs of the two kernels, we set the number of repetitions I in Algorithm 1 to d (the embedding dimension).