reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Assessing Robustness via Score-Based Adversarial Image Generation

Authors: Marcel Kollovieh, Lukas Gosch, Marten Lienen, Yan Scholten, Leo Schwinn, Stephan Günnemann

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive empirical evaluation demonstrates that Score AG improves upon the majority of state-of-the-art attacks and defenses across multiple benchmarks. This work highlights the importance of investigating adversarial examples bounded by semantics rather than ℓp-norm constraints. Score AG represents an important step towards more encompassing robustness assessments. [...] 4 Experimental Evaluation The primary objective of our experimental evaluation is to assess the capability of Score AG in generating and purifying adversarial examples.
Researcher Affiliation	Academia	Marcel Kollovieh1,2,3, Lukas Gosch1,2,3, Marten Lienen1, Yan Scholten1,2, Leo Schwinn1,2, Stephan Günnemann1,2,3 EMAIL 1School of Computation, Information and Technology, Technical University of Munich, 2Munich Data Science Institute, 3Munich Center for Machine Learning
Pseudocode	Yes	A.3 Pseudocode We present the pseudocode of Score AG in Alg. 15, implementing the sampler proposed by Karras et al. (2022). Here, s denotes the scale parameter for the task, while ti and γi are scheduler parameters retained from the original configuration (see Tab. 7). [...] Algorithm 1 Score AG with the sampler of Karras et al. (2022).
Open Source Code	No	The paper does not provide a direct link to the source code for the methodology described in this paper, nor does it explicitly state that the code is being released or is available in supplementary materials. It only mentions using implementations of official repositories for baselines.
Open Datasets	Yes	We employ three benchmark datasets for our experiments: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), and Tiny Imagenet. [...] Additionally, we evaluate Score AG on the high-resolution Image Net-Compatible3 dataset, a commonly used subset of Image Net. 3https://github.com/cleverhans-lab/cleverhans/tree/master/cleverhans_v3.1.0/examples/nips17_adversarial_ competition/dataset.
Dataset Splits	Yes	We employ three benchmark datasets for our experiments: CIFAR10, CIFAR100 (Krizhevsky et al., 2009), and Tiny Imagenet. We utilize pre-trained Elucidating Diffusion Models (EDM) in the variance preserving (VP) setup (Karras et al., 2022; Wang et al., 2023) for image generation. As our classifier, we opt for the well-established Wide Res Net architecture WRN-28-10 (Zagoruyko & Komodakis, 2016).
Hardware Specification	Yes	B.8 Runtime comparison of the attacks. All experiments were conducted on A100s.
Software Dependencies	No	Our models are implemented using Py Torch with the pre-trained EDM models by Karras et al. (2022) and Wang et al. (2023), and the guidance scores are computed using automatic differentiation. In Tab. 5 and Tab. 6, we give an overview of the hyperparameters of Score AG. For the methods Diff Attack, Diff Pure, CAA, PPGD, and LPA, we use the corresponding authors official implementations with the suggested hyperparameters. For the remaining baselines, we use Torchattacks (Kim, 2020).
Experiment Setup	Yes	The classifiers are trained for 400 epochs using SGD with Nesterov momentum of 0.9 and weight decay of 5 10 4. Additionally, we incorporate a cyclic learning rate scheduler with cosine annealing (Smith & Topin, 2019) with an initial learning rate of 0.2. To further stabilize the training process, we apply exponential moving average with a decay rate of 0.995. Each classifier is trained four times to ensure the reproducibility of our results, and we report standard deviations with ( ). [...] Table 5: Hyperparameters used to train the WRN-28-10 classifiers. [...] Table 6: Hyperparameters used to evaluate Score AG.