reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Does Training with Synthetic Data Truly Protect Privacy?

Authors: Yunpeng Zhao, Jie Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To rigorously measure the privacy leakage of empirical methods trained on synthetic data, we use membership inference attacks (Shokri et al., 2017) as a privacy auditing tool. We provide a systematic privacy evaluation on these four training paradigms. For each training paradigm, we interact only with the final model trained on synthetic data, and then determine whether a particular data point was part of the private training dataset. We conduct all experiments on CIFAR-10 (Krizhevsky & Hinton, 2009), as all training methods are scalable to CIFAR-10 and achieve good test accuracy. We report the performance of these methods across three dimensions: privacy leakage (TPR@0.1% FPR), model utility (test accuracy), and efficiency (training time).
Researcher Affiliation	Academia	Yunpeng Zhao National University of Singapore EMAIL Jie Zhang ETH Zurich EMAIL
Pseudocode	No	The paper describes various methods (Coreset Selection, Dataset Distillation, Data-Free Knowledge Distillation, Synthetic Data from Fine-Tuned Diffusion Models) using mathematical formulations and textual descriptions, but it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The source code is available at https://github.com/yunpeng-zhao/ syndata-privacy.
Open Datasets	Yes	We conduct all experiments on CIFAR-10 (Krizhevsky & Hinton, 2009)... For example, we use CINIC-10 (Darlow et al., 2018), an extension of CIFAR-10 incorporating downsampled Image Net images, for initialization.
Dataset Splits	Yes	We designate 500 random data points as audit samples on which we evaluate membership inference, and we use mislabeled data as strong canaries to simulate worst case data; the remaining 49,500 samples are always included in every model s training data. For each method, we train 32 shadow models, ensuring that each audit sample is included in the training data of 16 models.
Hardware Specification	No	The paper mentions 'TESLA (Cui et al., 2023)' and 'TESLA version' in the context of memory for specific methods but does not explicitly state the specific GPU or CPU models or other hardware used for running their experiments. It primarily focuses on software-level details and training protocols rather than hardware specifications.
Software Dependencies	No	The paper describes various training procedures, optimizers (SGD), and network architectures (ResNet-18, ConvNet) but does not provide specific version numbers for software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	For the undefended baseline, we employ the same training procedure as described in (Aerni et al., 2024). Concretely, Res Net-18 models are trained using the SGD optimizer with a momentum of 0.9 and a weight decay of 0.0005. We use a batch size of 256 and typical data augmentation techniques, including random horizontal flips and random shifts of up to 4 pixels. The models are optimized over 200 epochs with a base learning rate of 0.1. We employ a linear warm-up of the learning rate during the first epoch, followed by a decay of the learning rate by a factor of 0.2 at epochs 60, 120, and 160. For each method, we train 32 shadow models... For all defenses, we consistently adopt Res Net-18 (He et al., 2016) as the network architecture of shadow models.