Algorithmic Stability Based Generalization Bounds for Adversarial Training

Authors: Runzhi Tian, Yongyi Mao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our additional experiments (e.g., Figure 1) suggests that this is quite common. In Figure 1, we perform AT with a 3-step PGD and measure the error of the model against 3-step PGD attack as well as its standard error in the training process. ... Experiments We conduct experiment for PGD-AT when G is chosen as tanhγ as well as the identity map.
Researcher Affiliation Academia Runzhi Tian University of Ottawa EMAIL Yongyi Mao University of Ottawa EMAIL
Pseudocode No The paper describes the AT algorithm iteratively with equations (7) and (8) but does not present it in a structured pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/rz Tian/AT-Stability.
Open Datasets Yes Specifically, Rice et al. (2020) shows that on the CIFAR-10 dataset (Krizhevsky et al., 2009)... The experiments are conducted on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and SVHN(Netzer et al., 2011).
Dataset Splits No The paper mentions evaluating on 'training and testing sets' (e.g., in Figure 1 caption and Section 5), but does not specify the exact split percentages, sample counts, or the methodology used to create these splits.
Hardware Specification Yes Training PRN-18 on CIFAR-10 and SVHN for 200 epochs spends around 18 hours with two NVIDIA V100 GPUs, and training WRN-34 on CIFAR-100 requires around three days to complete with the same computing resources.
Software Dependencies No The paper mentions model architectures like 'pre-activation Res Net 18 (PRN-18)' and 'Wide Res Net 34 (WRN-34)' but does not specify any software libraries or frameworks with their version numbers, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup Yes In our experiments, we follow the settings in Rice et al. (2020): The perturbation radius is set to be ϵ = 8/255 w.r.t the norm for the three datasets. ... We set K = 10 for all the PGD variants with λ = 2/255 on CIFAR-10 and CIFAR-100, and set λ = 1/255 for SVHN. The initial learning rate of AT is set to be 0.1 for CIFAR-10 and CIFAR-100 and set to be 0.01 for SVHN. The learning rate is decayed by 0.1 at the 100th and the 150th epoch of the training. The batch size is set to be 128 and a weight decay of 5 × 10−4 is used for all the experiments.