reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Private GANs, Revisited

Authors: Alex Bie, Gautam Kamath, Guojun Zhang

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate that on standard image synthesis benchmarks, DPSGD outperforms all alternative GAN privatization schemes. Code: https://github.com/alexbie98/dpgan-revisit. ... We plot in Figures 1a and 2 the evolution of FID and accuracy during DPGAN training for both MNIST and Fashion MNIST, under varying discriminator update frequencies n D. ... In Figure 3, we plot the evolution of generated images for an n D = 10 run over the course of training and observe qualitative evidence of mode collapse ... We scale up batch sizes, considering B {128, 512, 2048}, and search for the optimal noise level σ and n D (details in Appendix B.2).
Researcher Affiliation	Collaboration	Alex Bie EMAIL University of Waterloo Gautam Kamath EMAIL University of Waterloo Guojun Zhang EMAIL Huawei Noah s Ark Lab
Pseudocode	Yes	Algorithm 1 Train DPGAN(D; ϕ0, θ0, Opt D, Opt G, n D, T, B, C, σ, δ)
Open Source Code	Yes	Code: https://github.com/alexbie98/dpgan-revisit.
Open Datasets	Yes	We focus on labelled generation of MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017), both of which are comprised of 60K 28 28 grayscale images divided into 10 classes. ... The Celeb A dataset (Liu et al., 2015) consists of 202,599 178 218 RGB images of celebrity faces, each labelled with 40 binary attributes.
Dataset Splits	Yes	We focus on labelled generation of MNIST ... and Fashion MNIST ..., both of which are comprised of 60K 28 28 grayscale images divided into 10 classes. ... To measure downstream task utility, we again follow prior work, and train a CNN classifier on 60K generated image-label pairs and report its accuracy on the real test set. ... The Celeb A dataset ... is obtained by resizing to 32x32 and labelling with the gender attribute. The 202,599 images are partitioned into a training set of size 182,637 and a test set of size 19,962.
Hardware Specification	Yes	We report wall clock times for training runs under various hyperparameter settings, which are executed on 1 NVIDIA A40 card setups running Py Torch 1.11.0+CUDA 11.3.1 and Opacus 1.1.3.
Software Dependencies	Yes	We report wall clock times for training runs under various hyperparameter settings, which are executed on 1 NVIDIA A40 card setups running Py Torch 1.11.0+CUDA 11.3.1 and Opacus 1.1.3.
Experiment Setup	Yes	For MNIST and Fashion MNIST, we begin from an open source Py Torch (Paszke et al., 2019) implementation of DCGAN (Radford et al., 2016) ... This includes: batch size B = 128, the Adam optimizer (Kingma & Ba, 2015) with parameters (α = 0.0002, β1 = 0.5, β2 = 0.999) for both G and D, the non-saturating GAN loss (Goodfellow et al., 2014), and a 5-layer fully convolutional architecture with width parameter d = 128. ... For our baseline setting, we use the following DPSGD hyperparameters: we keep the non-private (expected) batch size B = 128, and use a noise level σ = 1 and clipping norm C = 1.