Data Augmentation Can Improve Robustness

Authors: Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, Timothy A Mann

NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on CIFAR-10 against ℓ and ℓ2 norm-bounded perturbations of size ϵ = 8/255 and ϵ = 128/255, respectively. We show large absolute improvements of +2.93% and +2.16% in robust accuracy compared to previous state-of-the-art methods. We conduct thorough experiments to show that our approach generalizes across architectures, datasets and threat models.
Researcher Affiliation Industry Sylvestre-Alvise Rebuffi*, Sven Gowal*, Dan Calian, Florian Stimberg, Olivia Wiles and Timothy Mann Deep Mind, London EMAIL
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The code written in JAX [4] and Haiku [26] is available online at https://github.com/ deepmind/deepmind-research/tree/master/adversarial_robustness.
Open Datasets Yes We evaluate our approach on CIFAR-10 against ℓ and ℓ2 norm-bounded perturbations of size ϵ = 8/255 and ϵ = 128/255, respectively. We also achieve a significant performance boost with this approach while using other architectures and datasets such as CIFAR-100, SVHN and TINYIMAGENET.
Dataset Splits Yes Specifically, we train two (and only two) models for each hyperparameter setting, perform early stopping for each model on a separate validation set of 1024 samples using PGD40 similarly to Rice et al. [44] and pick the best model by evaluating the robust accuracy on the same validation set .
Hardware Specification Yes We train for 400 epochs with a batch size of 512 split over 32 Google Cloud TPU v3 cores [4], and the learning rate is initially set to 0.1 and decayed by a factor 10 two-thirds-of-the-way through training.
Software Dependencies No The paper mentions JAX [4] and Haiku [26] but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We train for 400 epochs with a batch size of 512 split over 32 Google Cloud TPU v3 cores [4], and the learning rate is initially set to 0.1 and decayed by a factor 10 two-thirds-of-the-way through training. We scale the learning rates using the linear scaling rule of Goyal et al. [21] (i.e., effective LR = max(LR batch size/256, LR)). The decay rate of WA is set to τ = 0.999.