reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

BIGRoC: Boosting Image Generation via a Robust Classifier

Authors: Roy Ganz, Michael Elad

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this post-processing algorithm on various image synthesis methods and show a significant quantitative and qualitative improvement on CIFAR-10 and Image Net. Specifically, BIGRo C improves the image synthesis best performing diffusion model on Image Net 128 128 by 14.81%, attaining an FID score of 2.53 and on 256 256 by 7.87%, achieving an FID of 3.63. Moreover, we conduct an opinion survey, according to which humans significantly prefer our method s outputs.
Researcher Affiliation	Academia	Roy Ganz EMAIL Electrical Engineering Department Technion Michael Elad EMAIL Computer Science Department Technion
Pseudocode	Yes	Algorithm 1 Targeted Projected Gradient Descent (PGD) Input: classifier fθ, input x, target class ˆy, ϵ, step size α, number of iterations T δ0 0 for t from 0 to T do δt+1 = Πϵ(δt α δℓ(fθ(x + δt), ˆy)); end xadv = x + δT Output: xadv
Open Source Code	No	In this work, except for three basic generative models (see description in C.1), we did not train models and used only available pretrained ones from verified sources. For quantitative evaluation, we use common implementations of IS and FID metrics.
Open Datasets	Yes	In this section, we present experiments that demonstrate the effectiveness of our method on the most common datasets for image synthesis CIFAR-10 (Krizhevsky, 2012) and Image Net (Deng et al., 2009).
Dataset Splits	No	While the above-described approach for unconditional sampling works well, it could be further improved. We have noticed that in this case, estimating the target classes Y of Xgen via fθ might lead to unbalanced labels. For example, in the CIFAR-10, when generating 50,000 samples, we expect approximately 5,000 images per each of the ten classes, and yet the labels estimation does not distribute uniformly at all.
Hardware Specification	No	In all of our experiments, we use the Google COLAB service with a single GPU.
Software Dependencies	No	Adversarial Robust Classifier We use a pretrained robust Res Net-50 on CIFAR-10, provided in (Engstrom et al., 2019). This model is adversarially trained with a threat model = {δ : δ 2 0.5} with step size α of 0.1.
Experiment Setup	Yes	BIGRo C hyperparameters As stated above, we use the same robust classifier in all our experiments. Thus, the remaining hyperparameters to tune are ϵ, step size, and the number of steps. We empirically find out that the number of steps is relatively marginal (see Section A.1), and thus we opt to use 7. In all the experiments, we fix the step size to be 1.5 ϵ num steps. Since we test various image generators of different qualities, each requires a particular amount of visual enhancement, defined by the value of ϵ. Low-quality generators require substantial improvement and therefore benefit from high ϵ values, while better ones benefit from smaller values. In the below table, we summarize the value of ϵ for each tested architecture. Table 7: BIGRo C s ϵ values in CIFAR-10 experiments Architecture ϵ normalization VAE 25 DCGAN 5 WGAN-GP 5 SNGAN 3 SSGAN 3 c GAN 5 c GAN-PD 2 Big GAN 1 Diff Big GAN 1