reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data Augmentation Can Improve Robustness

Authors: Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, Timothy A Mann

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on CIFAR-10 against ℓ and ℓ2 norm-bounded perturbations of size ϵ = 8/255 and ϵ = 128/255, respectively. We show large absolute improvements of +2.93% and +2.16% in robust accuracy compared to previous state-of-the-art methods. We conduct thorough experiments to show that our approach generalizes across architectures, datasets and threat models.
Researcher Affiliation	Industry	Sylvestre-Alvise Rebufﬁ, Sven Gowal, Dan Calian, Florian Stimberg, Olivia Wiles and Timothy Mann Deep Mind, London EMAIL
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	The code written in JAX [4] and Haiku [26] is available online at https://github.com/ deepmind/deepmind-research/tree/master/adversarial_robustness.
Open Datasets	Yes	We evaluate our approach on CIFAR-10 against ℓ and ℓ2 norm-bounded perturbations of size ϵ = 8/255 and ϵ = 128/255, respectively. We also achieve a signiﬁcant performance boost with this approach while using other architectures and datasets such as CIFAR-100, SVHN and TINYIMAGENET.
Dataset Splits	Yes	Speciﬁcally, we train two (and only two) models for each hyperparameter setting, perform early stopping for each model on a separate validation set of 1024 samples using PGD40 similarly to Rice et al. [44] and pick the best model by evaluating the robust accuracy on the same validation set .
Hardware Specification	Yes	We train for 400 epochs with a batch size of 512 split over 32 Google Cloud TPU v3 cores [4], and the learning rate is initially set to 0.1 and decayed by a factor 10 two-thirds-of-the-way through training.
Software Dependencies	No	The paper mentions JAX [4] and Haiku [26] but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We train for 400 epochs with a batch size of 512 split over 32 Google Cloud TPU v3 cores [4], and the learning rate is initially set to 0.1 and decayed by a factor 10 two-thirds-of-the-way through training. We scale the learning rates using the linear scaling rule of Goyal et al. [21] (i.e., effective LR = max(LR batch size/256, LR)). The decay rate of WA is set to τ = 0.999.