Accelerating Adversarial Training on Under-Utilized GPU
Authors: Zhuoxin Zhan, Ke Wang, Pulei Xiong
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results on various machine learning tasks and datasets show that Attack Rider can speed up state-of-the-art adversarial training algorithms with comparable robust accuracy. We conduct extensive experiments to study the speedup provided by Attack Rider for the targeted application scenarios discussed in Sec. 2. All experiments are conducted on a server with four NVIDIA RTX 6000 Ada GPUs and five datasets from image and tabular domains. Table 1 summarizes the GPUs, datasets, model information, batch size, and O. We adopt Res Net-18 (RN) [He et al., 2016] for the image datasets and FT-Transformer (FT-T) [Gorishniy et al., 2021] for the tabular datasets. With our GPU server, these models represent varied extent of under-utilization of GPU indicated by the different O values. The source code of Attack Rider is available at at https://github.com/zxzhan/Attack Rider. |
| Researcher Affiliation | Academia | Zhuoxin Zhan1 , Ke Wang1 and Pulei Xiong2,1 1Simon Fraser University 2National Research Council Canada EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Attack Rider-e Input: Training dataset D; Base AT Base AT = {Get FB( ), Atk2( ), Upd( )}; Packing size e Output: Robust model fθ 1: Initialize f with random parameters fθ θ 2: repeat // Attack Generation: 3: FB Get FB( ) 4: b e/FB 5: Read b minibatches X from D 6: X Atk2(X, FB, fθ) // Model Update: 7: X Shuffle( X) 8: { Xi, |i = 1, , b} Divide( X) 9: {Xi, |i = 1, , b, examples in X correspond to Xi} 10: for i = 1, , b do 11: θ Upd( Xi, Xi, fθ) 12: until training converged 13: return fθ |
| Open Source Code | Yes | The source code of Attack Rider is available at https: //github.com/zxzhan/Attack Rider. |
| Open Datasets | Yes | For example, on the CIFAR-10 dataset [Krizhevsky et al., 2009], training a Wide Res Net-34-10 [Zagoruyko and Komodakis, 2016] model with PGDAT took 45.22 hours but natural training only took 3.88 hours [Zhang et al., 2019a]. Table 1: Summary of GPU, datasets and models. # Train and # Test is the number of samples in training and test set, m is the training batch size of the base AT algorithm and O is the overhead throughput for the model/GPU setup. GPU 4 RTX 6000 Dataset Image Tabluar CIFAR-10 CIFAR-100 Tiny Image Net Jannis Cover Type Model RN RN RN FT-T FT-T Input Size 32 32 3 32 32 3 64 64 3 59 59 Model Size 34 109 34 109 137 109 9 105 9 105 # Class 10 100 200 4 7 # Train 50,000 50,000 100,000 53,588 371,847 # Test 10,000 10,000 10,000 16,749 116,203 m 128 128 128 512 1024 O 6.0 6.0 2.0 6.0 3.0 |
| Dataset Splits | Yes | Table 1: Summary of GPU, datasets and models. # Train and # Test is the number of samples in training and test set, m is the training batch size of the base AT algorithm and O is the overhead throughput for the model/GPU setup. GPU 4 RTX 6000 Dataset Image Tabluar CIFAR-10 CIFAR-100 Tiny Image Net Jannis Cover Type Model RN RN RN FT-T FT-T Input Size 32 32 3 32 32 3 64 64 3 59 59 Model Size 34 109 34 109 137 109 9 105 9 105 # Class 10 100 200 4 7 # Train 50,000 50,000 100,000 53,588 371,847 # Test 10,000 10,000 10,000 16,749 116,203 m 128 128 128 512 1024 O 6.0 6.0 2.0 6.0 3.0 |
| Hardware Specification | Yes | All experiments are conducted on a server with four NVIDIA RTX 6000 Ada GPUs and five datasets from image and tabular domains. Table 1: Summary of GPU, datasets and models. GPU 4 RTX 6000 |
| Software Dependencies | No | All models on the image datasets are trained with an SGD optimizer for 120 epochs following [Li et al., 2023a], and all models on the tabular datasets are trained with an Adam W optimizer for 100 epochs following [Gorishniy et al., 2021]. We evaluate the model that has the best test PGD20 robust accuracy within the specified number of epochs. For hyperparameter and base AT specific settings, we mostly follow the original papers of each base AT. The interested reader please refer to the separate Appendix file for more details. This text does not provide specific software versions for libraries like PyTorch, TensorFlow, CUDA, etc. |
| Experiment Setup | Yes | All models on the image datasets are trained with an SGD optimizer for 120 epochs following [Li et al., 2023a], and all models on the tabular datasets are trained with an Adam W optimizer for 100 epochs following [Gorishniy et al., 2021]. We evaluate the model that has the best test PGD20 robust accuracy within the specified number of epochs. For hyperparameter and base AT specific settings, we mostly follow the original papers of each base AT. The interested reader please refer to the separate Appendix file for more details. Table 1: Summary of GPU, datasets and models. m 128 128 128 512 1024 |