Identifying and Understanding Cross-Class Features in Adversarial Training

Authors: Zeming Wei, Steven Y. Guo, Yisen Wang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through systematic studies across multiple model architectures and settings, we find that during the initial stage of AT, the model tends to learn more cross-class features until the best robustness checkpoint... We provide extensive empirical evidence to support the observations and the hypothesis. First, we propose a metric to measure the usage of the cross-class features for a certain model. Then, among various perturbation norms, datasets, and architectures, we show that the best robustness model consistently uses more cross-class features than the robust overfitted ones, showing a clear correlation between robust generalization and cross-class features. We further provide theoretical insights to intuitively understand this effect through a synthetic data model, where we show that cross-class features are more sensitive to robust loss, but they are indeed helpful for robust classification.
Researcher Affiliation Academia 1School of Mathematical Sciences, Peking University 2Independent Researcher 3State Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 4Institute for Artificial Intelligence, Peking University. Correspondence to: Yisen Wang <EMAIL>.
Pseudocode Yes The complete algorithm of calculating matrix C is shown in Algorithm 1 in Appendix. For two classes indexed by i and j, C[i, j] represents the similarity of their feature attribution vector, where a higher value indicates the model uses more features shared by these classes. Algorithm 1 Feature Attribution Correlation Matrix
Open Source Code Yes Our code is available at https://github.com/PKU-ML/Cross-Class-Features-AT.
Open Datasets Yes The model is trained on the CIFAR-10 dataset (Krizhevsky et al., 2009) using Pre Act Res Net-18 (He et al., 2016) for 200 epochs... We extend our observations by illustrating the comparisons on the CIFAR-100 (Krizhevsky et al., 2009) and the Tiny Imagenet (mnmoustafa, 2017) datasets in Figure 5.
Dataset Splits Yes The model is trained on the CIFAR-10 dataset (Krizhevsky et al., 2009)... We extend our observations by illustrating the comparisons on the CIFAR-100 (Krizhevsky et al., 2009) and the Tiny Imagenet (mnmoustafa, 2017) datasets in Figure 5.
Hardware Specification No No specific hardware details (like GPU/CPU models, processors, or memory) are mentioned in the paper. It discusses model architectures (e.g., Pre Act Res Net-18, Deit-Ti) but not the underlying hardware used for experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiment.
Experiment Setup Yes For the detailed configurations of training, we follow the implementation of (Pang et al., 2021), which provides a popular repository of AT with basic training tricks. The model is trained on the CIFAR-10 dataset (Krizhevsky et al., 2009) using Pre Act Res Net-18 (He et al., 2016) for 200 epochs... A complete list of hyperparameters for experiments in this Section is presented in Appendix B. Table 2. Hyperparameters for AT Parameter Value Train epochs 200 SGD Momentum 0.9 batch size 128 weight decay 5e-4 Initial learning rate 0.1 Learning rate decay 100-th, 150-th epoch Learning rate decay rate 0.1 training PGD steps 10 training PGD step size (ℓ) ϵ/4 (ϵ is the perturbation bound) training PGD step size (ℓ2) ϵ/8 (ϵ is the perturbation bound)