Accelerating Adversarial Training on Under-Utilized GPU

Authors: Zhuoxin Zhan, Ke Wang, Pulei Xiong

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results on various machine learning tasks and datasets show that Attack Rider can speed up state-of-the-art adversarial training algorithms with comparable robust accuracy. We conduct extensive experiments to study the speedup provided by Attack Rider for the targeted application scenarios discussed in Sec. 2. All experiments are conducted on a server with four NVIDIA RTX 6000 Ada GPUs and five datasets from image and tabular domains. Table 1 summarizes the GPUs, datasets, model information, batch size, and O. We adopt Res Net-18 (RN) [He et al., 2016] for the image datasets and FT-Transformer (FT-T) [Gorishniy et al., 2021] for the tabular datasets. With our GPU server, these models represent varied extent of under-utilization of GPU indicated by the different O values. The source code of Attack Rider is available at at https://github.com/zxzhan/Attack Rider.
Researcher Affiliation Academia Zhuoxin Zhan1 , Ke Wang1 and Pulei Xiong2,1 1Simon Fraser University 2National Research Council Canada EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Attack Rider-e Input: Training dataset D; Base AT Base AT = {Get FB( ), Atk2( ), Upd( )}; Packing size e Output: Robust model fθ 1: Initialize f with random parameters fθ θ 2: repeat // Attack Generation: 3: FB Get FB( ) 4: b e/FB 5: Read b minibatches X from D 6: X Atk2(X, FB, fθ) // Model Update: 7: X Shuffle( X) 8: { Xi, |i = 1, , b} Divide( X) 9: {Xi, |i = 1, , b, examples in X correspond to Xi} 10: for i = 1, , b do 11: θ Upd( Xi, Xi, fθ) 12: until training converged 13: return fθ
Open Source Code Yes The source code of Attack Rider is available at https: //github.com/zxzhan/Attack Rider.
Open Datasets Yes For example, on the CIFAR-10 dataset [Krizhevsky et al., 2009], training a Wide Res Net-34-10 [Zagoruyko and Komodakis, 2016] model with PGDAT took 45.22 hours but natural training only took 3.88 hours [Zhang et al., 2019a]. Table 1: Summary of GPU, datasets and models. # Train and # Test is the number of samples in training and test set, m is the training batch size of the base AT algorithm and O is the overhead throughput for the model/GPU setup. GPU 4 RTX 6000 Dataset Image Tabluar CIFAR-10 CIFAR-100 Tiny Image Net Jannis Cover Type Model RN RN RN FT-T FT-T Input Size 32 32 3 32 32 3 64 64 3 59 59 Model Size 34 109 34 109 137 109 9 105 9 105 # Class 10 100 200 4 7 # Train 50,000 50,000 100,000 53,588 371,847 # Test 10,000 10,000 10,000 16,749 116,203 m 128 128 128 512 1024 O 6.0 6.0 2.0 6.0 3.0
Dataset Splits Yes Table 1: Summary of GPU, datasets and models. # Train and # Test is the number of samples in training and test set, m is the training batch size of the base AT algorithm and O is the overhead throughput for the model/GPU setup. GPU 4 RTX 6000 Dataset Image Tabluar CIFAR-10 CIFAR-100 Tiny Image Net Jannis Cover Type Model RN RN RN FT-T FT-T Input Size 32 32 3 32 32 3 64 64 3 59 59 Model Size 34 109 34 109 137 109 9 105 9 105 # Class 10 100 200 4 7 # Train 50,000 50,000 100,000 53,588 371,847 # Test 10,000 10,000 10,000 16,749 116,203 m 128 128 128 512 1024 O 6.0 6.0 2.0 6.0 3.0
Hardware Specification Yes All experiments are conducted on a server with four NVIDIA RTX 6000 Ada GPUs and five datasets from image and tabular domains. Table 1: Summary of GPU, datasets and models. GPU 4 RTX 6000
Software Dependencies No All models on the image datasets are trained with an SGD optimizer for 120 epochs following [Li et al., 2023a], and all models on the tabular datasets are trained with an Adam W optimizer for 100 epochs following [Gorishniy et al., 2021]. We evaluate the model that has the best test PGD20 robust accuracy within the specified number of epochs. For hyperparameter and base AT specific settings, we mostly follow the original papers of each base AT. The interested reader please refer to the separate Appendix file for more details. This text does not provide specific software versions for libraries like PyTorch, TensorFlow, CUDA, etc.
Experiment Setup Yes All models on the image datasets are trained with an SGD optimizer for 120 epochs following [Li et al., 2023a], and all models on the tabular datasets are trained with an Adam W optimizer for 100 epochs following [Gorishniy et al., 2021]. We evaluate the model that has the best test PGD20 robust accuracy within the specified number of epochs. For hyperparameter and base AT specific settings, we mostly follow the original papers of each base AT. The interested reader please refer to the separate Appendix file for more details. Table 1: Summary of GPU, datasets and models. m 128 128 128 512 1024