reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DP-LFlow: Differentially Private Latent Flow for Scalable Sensitive Image Generation

Authors: Dihong Jiang, Sun Sun

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the effectiveness and scalability of the proposed method via extensive experiments, where the proposed method achieves a significantly better privacy-utility trade-off compared to existing alternatives. Notably, our method is the first DPGM to scale to high-resolution image sets (up to 256 256). In this section, we evaluate and compare DP-LFlow against So TA baselines through extensive experiments in Section 4.2. More importantly, we will show that DP-LFlow is amenable to high-resolution image sets in Section 4.3, which was hardly studied in prior related works.
Researcher Affiliation	Academia	Dihong Jiang EMAIL Department of Computer Science University of Waterloo Sun Sun EMAIL National Research Council Canada University of Waterloo
Pseudocode	Yes	Algorithm 1: Gradient perturbation in DP-SGD Input: Private training set X = {xi}N i=1, loss function L( ), batch size B, noise multiplier σ, gradient clipping bound C, model parameter θ 1 for i 1 to B do 2 gθ(xi) = θL(xi; θ) 3 gθ(xi) = gθ(xi) min 1, C gθ(xi) 2 B PB i=1 gθ(xi) + N(0, σ2C2I)
Open Source Code	Yes	Our code is available at https://github.com/dihjiang/DP-LFlow.
Open Datasets	Yes	Datasets: We consider three widely used image datasets, including both grayscale images (MNIST (Le Cun et al., 1998), Fashion MNIST (Xiao et al., 2017)) and RGB images (Celeb A (Liu et al., 2015)), as well as one high-resolution RGB datasets (Celeb A-HQ (Karras et al., 2018), for our presentation only).
Dataset Splits	Yes	MNIST (Le Cun et al., 1998) & Fashion MNIST (Xiao et al., 2017): ... We adopt the official training and test split. 10k images from the training split are randomly held out as the validation set. Celeb A (Liu et al., 2015): ... We also adopt the official training, validation and test split, but randomly select 50k images of each gender from the training split as our training set. Celeb A-HQ (Karras et al., 2018): ... 1999 images are randomly held out from the training split as the validation set.
Hardware Specification	Yes	However, their required computational resource is significantly higher than DP-LFlow, e.g. Dockhorn et al. (2023) need 8 GPUs and one day to train a DP diffusion model on MNIST and FMNIST, while our method only requires 1 single GPU and a few hours; On Celeb A, Dockhorn et al. (2023) needs 8 GPUs and 4 days, while our method only needs 1 GPU and around half-day.
Software Dependencies	No	We use a public repo, i.e. pyvacy, for implementing DP-SGD algorithm, as well as the total privacy calculation. Pyvacy tracks the privacy loss by RDP accountant, which is a Py Torch implementation based on Tensorflow Privacy. We import scikit-learn package for implementation logistic regression classifier (e.g. from sklearn.linear_model import Logistic Regression) with default parameter settings.
Experiment Setup	Yes	For all datasets we use, we set subsampling rate as 0.1, training iterations as 300, noise multiplier as 1.25 to target (10, 10 5)-DP and 4.5 to target (1, 10 5)-DP, respectively. With better evaluation performance on the validation set, gradient clipping norms are set as 0.1 for MNIST and Fashion MNIST, 0.01 for Celeb A, and 10 for Celeb A-HQ. Table 5: Network configurations for different datasets in the experiments. #h_conv denotes the number of hidden sizes in the convolutional layers. #h_lin denotes the number of hidden sizes in the linear layers. #c denotes the length of latent code. #b means the number of blocks in flow.