reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Differentially Private Latent Diffusion Models

Authors: Michael F Liu, Saiyue Lyu, Margarita Vinaroz, Mijung Park

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our method is capable of generating quality images in various scenarios. We perform an in-depth analysis of the ablation of DP-LDM to explore the strategy for reducing parameters for more applicable training of DP-SGD. Based on our promising results, we conclude that fine-tuning LDMs is an efficient and effective framework for DP generative learning. We hope our results can contribute to future research in DP data generation, considering the rapid advances in diffusion-based generative modelling.
Researcher Affiliation	Academia	Michael Liu EMAIL Department of Computer Science University of British Columbia Saiyue Lyu EMAIL Department of Computer Science University of British Columbia Margarita Vinaroz EMAIL University of Tübingen International Max Planck Research School for Intelligent Systems (IMPRS-IS) Mijung Park EMAIL Department of Computer Science University of British Columbia
Pseudocode	Yes	Algorithm 1 DP-LDM Input: Latent representations and conditions if conditional: {(zi, yi)}N i=1, a pre-trained model θ, number of iterations P, mini-batch size B, clipping norm C, learning rate η, privacy parameter σ corresponding to (ϵ, δ)-DP. Denote ˆθ = {θAttn, θCn} for p = 1 to P do Step 1. Take a mini-batch Bp uniformly at random with a sampling probability, q = B/N Step 2. For each sample i Bp compute the gradient: gp(zi, yi) = ˆθp Lldm(ˆθp, zi, yi) Step 3. Clip the gradient: ˆgp(zi, yi) = gp(zi, yi)/ max(1, gp(zi, yi) 2/C) Step 4. Add noise: gp = 1 B PB i=1 ˆgp(zi, yi) + N(0, σ2C2I) Step 5. Gradient descent: ˆθp+1 = ˆθp η gp end for Return: (ϵ, δ)-differentially private ˆθP = {θAttn P , θCn P }
Open Source Code	Yes	Our code is available at https://github.com/Park Lab ML/DP-LDM.
Open Datasets	Yes	Dataset licenses: MNIST: CC BY-SA 3.0; Celeb A: see https://mmlab.ie.cuhk.edu.hk/projects/Celeb A.html; CIFAR-10: MIT; Camelyon17: CC0 Table 1: Private and public dataset pairs, with corresponding evaluation metric and choices of classifiers.
Dataset Splits	Yes	All the classifiers are trained with 50K synthetic samples and then evaluated on real data samples. For each dataset, we follow previous work to choose classifier models for a fair comparison.
Hardware Specification	Yes	For instance, to generate the CIFAR10 synthetic images using an NVIDIA V100 32GB, a recent work called DP-API by Lin et al. (2023) requires 500 GPU hours and DP-Diffusion requires 1250 GPU hours (See Figure 42 in (Lin et al., 2023)). In our case, when using an NVIDIA RTX A4000 16GB GPU (slower than V100 32GB), fine-tuning took 15 GPU hours, and pre-training took 192 GPU hours.
Software Dependencies	No	We implemented DP-LDMs in Py Torch Lightning (Paszke et al., 2019) building on the LDM codebase by Rombach et al. (2022) and Opacus (Yousefpour et al., 2021) for DP-SGD training. ... For our experiments incorporating Lo RA, we use the loralib (Hu et al., 2021) Python library.
Experiment Setup	Yes	B Hyperparameters Here we provide an overview of the hyperparameters of the pretrained autoencoder in Table 17, hyperparameters of the pretrained diffusion models in Table 18. Table 19: Hyperparameters for fine-tuning diffusion models with DP constraints ϵ = 10, 1 and δ = 10 5 on MNIST. The ablation hyperparameter determines which attention modules are fine-tuned, where a value of i means that the first i 1 attention modules are frozen and others are trained. Setting ablation to 1 (default) fine-tunes all attention modules.