BIGRoC: Boosting Image Generation via a Robust Classifier
Authors: Roy Ganz, Michael Elad
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this post-processing algorithm on various image synthesis methods and show a significant quantitative and qualitative improvement on CIFAR-10 and Image Net. Specifically, BIGRo C improves the image synthesis best performing diffusion model on Image Net 128 128 by 14.81%, attaining an FID score of 2.53 and on 256 256 by 7.87%, achieving an FID of 3.63. Moreover, we conduct an opinion survey, according to which humans significantly prefer our method s outputs. |
| Researcher Affiliation | Academia | Roy Ganz EMAIL Electrical Engineering Department Technion Michael Elad EMAIL Computer Science Department Technion |
| Pseudocode | Yes | Algorithm 1 Targeted Projected Gradient Descent (PGD) Input: classifier fθ, input x, target class ˆy, ϵ, step size α, number of iterations T δ0 0 for t from 0 to T do δt+1 = Πϵ(δt α δℓ(fθ(x + δt), ˆy)); end xadv = x + δT Output: xadv |
| Open Source Code | No | In this work, except for three basic generative models (see description in C.1), we did not train models and used only available pretrained ones from verified sources. For quantitative evaluation, we use common implementations of IS and FID metrics. |
| Open Datasets | Yes | In this section, we present experiments that demonstrate the effectiveness of our method on the most common datasets for image synthesis CIFAR-10 (Krizhevsky, 2012) and Image Net (Deng et al., 2009). |
| Dataset Splits | No | While the above-described approach for unconditional sampling works well, it could be further improved. We have noticed that in this case, estimating the target classes Y of Xgen via fθ might lead to unbalanced labels. For example, in the CIFAR-10, when generating 50,000 samples, we expect approximately 5,000 images per each of the ten classes, and yet the labels estimation does not distribute uniformly at all. |
| Hardware Specification | No | In all of our experiments, we use the Google COLAB service with a single GPU. |
| Software Dependencies | No | Adversarial Robust Classifier We use a pretrained robust Res Net-50 on CIFAR-10, provided in (Engstrom et al., 2019). This model is adversarially trained with a threat model = {δ : δ 2 0.5} with step size α of 0.1. |
| Experiment Setup | Yes | BIGRo C hyperparameters As stated above, we use the same robust classifier in all our experiments. Thus, the remaining hyperparameters to tune are ϵ, step size, and the number of steps. We empirically find out that the number of steps is relatively marginal (see Section A.1), and thus we opt to use 7. In all the experiments, we fix the step size to be 1.5 ϵ num steps. Since we test various image generators of different qualities, each requires a particular amount of visual enhancement, defined by the value of ϵ. Low-quality generators require substantial improvement and therefore benefit from high ϵ values, while better ones benefit from smaller values. In the below table, we summarize the value of ϵ for each tested architecture. Table 7: BIGRo C s ϵ values in CIFAR-10 experiments Architecture ϵ normalization VAE 25 DCGAN 5 WGAN-GP 5 SNGAN 3 SSGAN 3 c GAN 5 c GAN-PD 2 Big GAN 1 Diff Big GAN 1 |