Algorithmic Stability Based Generalization Bounds for Adversarial Training
Authors: Runzhi Tian, Yongyi Mao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our additional experiments (e.g., Figure 1) suggests that this is quite common. In Figure 1, we perform AT with a 3-step PGD and measure the error of the model against 3-step PGD attack as well as its standard error in the training process. ... Experiments We conduct experiment for PGD-AT when G is chosen as tanhγ as well as the identity map. |
| Researcher Affiliation | Academia | Runzhi Tian University of Ottawa EMAIL Yongyi Mao University of Ottawa EMAIL |
| Pseudocode | No | The paper describes the AT algorithm iteratively with equations (7) and (8) but does not present it in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/rz Tian/AT-Stability. |
| Open Datasets | Yes | Specifically, Rice et al. (2020) shows that on the CIFAR-10 dataset (Krizhevsky et al., 2009)... The experiments are conducted on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009) and SVHN(Netzer et al., 2011). |
| Dataset Splits | No | The paper mentions evaluating on 'training and testing sets' (e.g., in Figure 1 caption and Section 5), but does not specify the exact split percentages, sample counts, or the methodology used to create these splits. |
| Hardware Specification | Yes | Training PRN-18 on CIFAR-10 and SVHN for 200 epochs spends around 18 hours with two NVIDIA V100 GPUs, and training WRN-34 on CIFAR-100 requires around three days to complete with the same computing resources. |
| Software Dependencies | No | The paper mentions model architectures like 'pre-activation Res Net 18 (PRN-18)' and 'Wide Res Net 34 (WRN-34)' but does not specify any software libraries or frameworks with their version numbers, such as Python, PyTorch, or TensorFlow versions. |
| Experiment Setup | Yes | In our experiments, we follow the settings in Rice et al. (2020): The perturbation radius is set to be ϵ = 8/255 w.r.t the norm for the three datasets. ... We set K = 10 for all the PGD variants with λ = 2/255 on CIFAR-10 and CIFAR-100, and set λ = 1/255 for SVHN. The initial learning rate of AT is set to be 0.1 for CIFAR-10 and CIFAR-100 and set to be 0.01 for SVHN. The learning rate is decayed by 0.1 at the 100th and the 150th epoch of the training. The batch size is set to be 128 and a weight decay of 5 × 10−4 is used for all the experiments. |