Improving Robust Generalization with Diverging Spanned Latent Space
Authors: Owen Dou, Zhiqiang Gao, Hangchi Shen, Ziling Yuan, Shufei Zhang, Kaizhu Huang
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed method can improve advanced AT methods and work remarkably well in various datasets, including CIFAR-10, CIFAR-100, SVHN, and Tiny-Image Net. We now turn to solve the optimization problem outlined in Section 3.2. Initially, to expand the entire volume of the latent linear span Vol (S), we introduce the following functions: Lspan = log det(S). (8) The function log det( ) is a smoothly concave function that aids in achieving an optimal solution more effectively. Moving on, to reduce the volume of the subspace linear span, we define the contraction component as below: Ni 2N log det I + ω where Ni represents the number of training samples for each class, and ω is a pre-defined parameter. These two components collaboratively function as: Ldiverge = γLshrink (1 γ)Lspan, (10) where γ is a balance hyper-parameter, scaling two functions effectively. 6 Experiments This section conducts comprehensive experiments to gauge the effectiveness of our SD method, in countering diverse adversarial examples. 6.1 Experimental Setting We evaluate our method s robustness against white-box and black-box adversarial examples on CIFAR10, CIFAR-100, SVHN, and Tiny-Imagenet. |
| Researcher Affiliation | Collaboration | Zhihao Dou EMAIL Data Science Research Center Duke Kunshan University Zhiqiang Gao EMAIL Department of Computer Science, College of Science, Mathematics and Technology Wenzhou-Kean University Hangchi Shen EMAIL College of Artificial Intelligence Southwest University Ziling Yuan EMAIL Bytedance Shufei Zhang EMAIL Shanghai Artificial Intelligence Laboratory Kaizhu Huang EMAIL Data Science Research Center Duke Kunshan University |
| Pseudocode | Yes | Algorithm 1 Adversarial Training with our Subspace Diverging (SD) Input: a neural network fθ( ) initialized with learnable parameters θ, ˆn batches of data pairs {(ˆx1, ˆy1), (ˆx2, ˆy2), . . . , (ˆxˆn, ˆyˆn)}, batch size V, a predefined hyper-parameter λ, Gaussian noise ϵ ( 0.015, 0.015), number of epochs e. Output: robust neural network fθ( ). Initialize Lmax gene 0, Lmax diverge 0, Lmax 0 for i 1 to ˆn do Add Gaussian noise on data samples: ˆx i = ˆxi + ϵ Generate the adversarial examples for i-th data batch: ˆxadv i argmax ˆx i (Lgene(ˆx i, ˆyi, θ)) Select the maximum loss values for three sets: Lmax gene max(Lgene(ˆx i, ˆyi, θ), Lmax gene) Lmax diverge max(Ldiverge(ˆx i, ˆyi, θ), Lmax gene) Lmax max(L(ˆxadv i , ˆyi, θ), Lmax) end for Calculate the k and d by: k = Lmax diverge Lmax and d = Lmax diverge Lmax gene for j 1 to e do for i 1 to ˆn do ˆx i ˆxi + ϵ Generate the adversarial examples: ˆxadv i arg max ˆx i Lgene(ˆx i, ˆyi, θ) + 1 d Ldiverge(ˆx i, ˆyi, θ) Update the classifier: θ min θ 1 V L(ˆxadv i , ˆyi, θ) + λ k Ldiverge(ˆxadv i , ˆyi, θ) end for end for return robust neural network fθ( ). |
| Open Source Code | No | The paper does not contain an explicit statement about releasing code for the described methodology, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Extensive experiments show that the proposed method can improve advanced AT methods and work remarkably well in various datasets, including CIFAR-10, CIFAR-100, SVHN, and Tiny-Image Net. |
| Dataset Splits | Yes | We evaluate our method s robustness against white-box and black-box adversarial examples on CIFAR10, CIFAR-100, SVHN, and Tiny-Imagenet. All the models discussed in this subsection are trained using Feature Scatter (FS) on the SVHN dataset. We select three classes for visualizing the feature distributions on the test dataset of CIFAR-10. We randomly select 1,000 samples for each class from the test dataset and sort all the selected samples according to their class indexes. Figures 5c and 5d illustrate the robust accuracy gap under C&W and PGD20 between training and test datasets of CIFAR-10. |
| Hardware Specification | Yes | All experiments are conducted on a single GPU, e.g. RTX 3090, under the environment using CUDA 11.7, Python 3.8, and Pytorch 1.80. |
| Software Dependencies | Yes | All experiments are conducted on a single GPU, e.g. RTX 3090, under the environment using CUDA 11.7, Python 3.8, and Pytorch 1.80. |
| Experiment Setup | Yes | In our training regimen, we employ SGD with a momentum of 0.9, weight decay of 5 * 10^-4, and an initial learning rate of 0.1. Learning rates decrease at epochs 60 and 90 by a factor of 0.1. During training, we perform 7 attack iterations for PGD-AT and TRADES, and 1 iteration for FS. For consistency, the attack budget ε is maintained at 8/255 for all the methods. Adversarial examples are computed with the ℓ norm during training and testing. Table 1 lists all used parameters on different baselines, batch sizes, and training epochs. γ is the balance parameter, ω and λ are pre-defined parameters, d and k are normalization parameters. |