Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation

Authors: Hiroshi Takahashi, Tomoharu Iwata, Atsutoshi Kumagai, Yuuki Yamanaka, Tomoya Yamashita

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments across various datasets and settings, we demonstrated that our approach can prevent the generation of sensitive images without compromising image quality. 5 EXPERIMENTS We used the following image datasets: MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky et al., 2009), STL10 (Coates et al., 2011), and Celeb A (Liu et al., 2015). ...We used a custom-defined metric called non-sensitive rate and the Fréchet Inception Distance (FID) score (Heusel et al., 2017) as evaluation metrics. Table 2: Comparison of non-sensitive rates for diffusion models with from-scratch training.
Researcher Affiliation Industry Hiroshi Takahashi NTT Corporation Tomoharu Iwata NTT Corporation Atsutoshi Kumagai NTT Corporation Yuuki Yamanaka NTT Corporation Tomoya Yamashita NTT Corporation Corresponding author: EMAIL
Pseudocode Yes Algorithm 1 Positive-Unlabeled Diffusion Model with Stochastic Gradient Descent
Open Source Code Yes The code is available at https://github.com/takahashihiroshi/pudm.
Open Datasets Yes We used the following image datasets: MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky et al., 2009), STL10 (Coates et al., 2011), and Celeb A (Liu et al., 2015).
Dataset Splits Yes The training data consist of unlabeled data U and labeled sensitive data S, where U include both normal and sensitive data. Meanwhile, the test data contain only normal data. For example, in MNIST, if we treat even numbers as normal, the unlabeled training data U include both even and odd numbers, the sensitive training data S include only odd numbers, and the test data include only even numbers. The number of data points in each dataset is shown in Table 1.
Hardware Specification Yes We used two machines for the experiments: one with Intel Xeon Platinum 8360Y CPU, 512GB of memory, and NVIDIA A100 SXM4 GPU, and the other with Intel Xeon Gold 6148 CPU, 384GB of memory, and NVIDIA V100 SXM2 GPU.
Software Dependencies No Our implementations are based on Diffusers (von Platen et al., 2022). We optimized these models using Adam W (Loshchilov, 2017) and a cosine scheduler with warmup.
Experiment Setup Yes We optimized these models using Adam W (Loshchilov, 2017) and a cosine scheduler with warmup. We set the learning rate to 10^-4, and set the warmup steps to 500. The batch size was 128 for MNIST and CIFAR10, 32 for STL10, and 16 for Celeb A. The number of epochs was set to 100 for from-scratch training and 20 for fine-tuning. We set the number of steps T during training to 1,000. For sampling, we used the denoising diffusion probabilistic model (DDPM) scheduler (Ho et al., 2020) in the from-scratch training, while we used the denoising diffusion implicit model (DDIM) scheduler (Song et al., 2020a) in the fine-tuning the pre-trained models, as the pre-trained models are large and time-consuming to sample. We set the sampling steps to 1,000 in the DDPM, and to 50 in the DDIM. For the proposed method, we set β = 0.1.