Stability and Generalization in Free Adversarial Training

Authors: Xiwei Cheng, Kexin Fu, Farzan Farnia

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct several numerical experiments to evaluate the train-to-test generalization gap in vanilla and free adversarial training methods. Our empirical findings also suggest that the free adversarial training method could lead to a smaller generalization gap over a similar number of training iterations. The paper code is available at https://github.com/Xiwei-Cheng/Stability_Free AT.
Researcher Affiliation Academia Xiwei Cheng EMAIL The Chinese University of Hong Kong Kexin Fu EMAIL Purdue University Farzan Farnia EMAIL The Chinese University of Hong Kong
Pseudocode Yes Algorithm 1 Vanilla Adversarial Training Algorithm AVanilla Algorithm 2 Free Adversarial Training Algorithm AFree Algorithm 3 Fast Adversarial Training Algorithm AFast Algorithm 4 Free TRADES Adversarial Training Algorithm AFree TRADES
Open Source Code Yes The paper code is available at https://github.com/Xiwei-Cheng/Stability_Free AT.
Open Datasets Yes We conduct our experiments on datasets CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), Tiny-Image Net (Le & Yang, 2015), and SVHN (Netzer et al., 2011).
Dataset Splits No The paper mentions using well-known datasets like CIFAR-10 and CIFAR-100 and references 'Following the standard setting in Madry et al. (2017)', which implies using their standard splits. However, it does not explicitly state the exact split percentages or sample counts for training, validation, and test sets. It only describes how subsets were sampled for a specific experiment on varying data size: 'We randomly sampled a subset from the CIFAR-10 training dataset of size n {10000, 20000, 30000, 40000, 50000}'.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) used in the experiments.
Experiment Setup Yes We apply mini-batch gradient descent with batch size b = 128. Weight decay is set to be 2 x 10^-4. We adopt a piecewise learning rate decay schedule, starting with 0.1 and decaying by a factor of 10 at the 100th and 150th epochs, for 200 total epochs. For the vanilla algorithm, we used a PGD adversary to perturb the image. For the free algorithm, we applied the learning rate of adversarial attack αδ = ε with free step m as 2, 4, 6, 8, and 10. For the fast adversarial training algorithm, we applied the learning rate of adversarial attack αδ = 7/255 for L attack and αδ = 64/255 for L2 attack over 200 training epochs.