On Intriguing Layer-Wise Properties of Robust Overfitting in Adversarial Training

Authors: Duke Nguyen, Chaojian Yu, Vinoth Nandakumar, Young Choon Lee, Tongliang Liu

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed RAT prototype effectively eliminates robust overfitting. The contributions of this work are summarized as follows: ... We design two different realizations of RAT, with extensive experiments on a number of standard benchmarks, verifying its effectiveness.
Researcher Affiliation Academia Duke Nguyen EMAIL Sydney AI Center The University of Sydney; Chaojian Yu EMAIL Sydney AI Center The University of Sydney; Vinoth Nandakumar EMAIL Sydney AI Center The University of Sydney; Young Choon Lee EMAIL School of Computing Macquarie University; Tongliang Liu EMAIL Sydney AI Center The University of Sydney
Pseudocode Yes Algorithm 1 RAT-prototype (in a mini-batch). Require: base adversarial training algorithm A, optimizer O, network fw, model parameter w = {w1, w2, ..., wn}, training data D = {(xi, yi)}, mini-batch B, front and latter layer conditions Cfront and Clatter for fw, gradient adjustment strategy S.
Open Source Code No No explicit statement about providing open-source code for the methodology or a link to a code repository is found in the paper.
Open Datasets Yes CIFAR-10 (Krizhevsky et al., 2009). The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images... CIFAR-100 (Krizhevsky et al., 2009). The CIFAR-100 dataset ... SVHN (Netzer et al., 2011). Street View House Numbers (SVHN) is a digit classification benchmark dataset...
Dataset Splits Yes The CIFAR-10 dataset ... There are 6000 images per class with 5000 training and 1000 testing images per class. ... The CIFAR-100 dataset ... There are 500 training images and 100 testing images per class. ... SVHN ... has three sets: training, testing sets and an extra set with 530,000 images that are less difficult and can be used for helping with the training process.
Hardware Specification No No specific hardware details (like GPU models, CPU types, or memory) used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions using SGD with momentum and weight decay but does not specify versions of any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python).
Experiment Setup Yes We use Pre Act Res Net-18 (He et al., 2016) and Wide Res Net-34-10 (Zagoruyko & Komodakis, 2016) following the same hyperparameter settings for AT in Rice et al. (2020): for L∞ threat model, ϵ = 8/255, step size is 1/255 for SVHN, and 2/255 for CIFAR-10 and CIFAR-100; for L2 threat model, ϵ = 128/255, step size is 15/255 for all datasets. For training, all models are trained under 10-step PGD (PGD-10) attack for 200 epochs using SGD with momentum 0.9, weight decay 5 × 10−4, and a piecewise learning rate schedule with an initial learning rate of 0.1. Standard data augmentation techniques, including random cropping with 4 pixels of padding and random horizontal flips, are applied.