Indirect Gradient Matching for Adversarial Robust Distillation

Authors: Hongsin Lee, Seungju Cho, Changick Kim

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that IGDM seamlessly integrates with existing AD methods, significantly enhancing their performance. Particularly, utilizing IGDM on the CIFAR100 dataset improves the Auto Attack accuracy from 28.06% to 30.32% with the Res Net-18 architecture and from 26.18% to 29.32% with the Mobile Net V2 architecture when integrated into the SOTA method without additional data augmentation.
Researcher Affiliation Academia Hongsin Lee , Seungju Cho , Changick Kim Korea Advanced Institute of Science and Technology (KAIST) EMAIL
Pseudocode Yes C MAIN ALGORITHM Algorithm 1 Main Algorithm
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We utilized the CIFAR-10/100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and Tiny-Image Net (Le & Yang, 2015) datasets for our experiments.
Dataset Splits Yes We utilized the CIFAR-10/100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and Tiny-Image Net (Le & Yang, 2015) datasets for our experiments. Random crop and random horizontal flip were applied, while other augmentations were not utilized.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes We utilized the CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), SVHN Netzer et al. (2011), and Tiny-Image Net (Le & Yang, 2015) datasets with random crop and random horizontal flips, excluding other augmentations. We trained all AT, AD, and IGDM incorporated methods using an SGD momentum optimizer with the same initial learning rate of 0.1, momentum of 0.9, and weight decay of 5e-4. For CIFAR-10 and CIFAR-100, we adhered to the training settings of other adversarial distillation methods, training for 200 epochs, except for RSLAD and IGDMRSLAD, which were trained for 300 epochs. RSLAD suggested that increasing the number of training epochs could enhance model robustness; thus, we followed their recommendation to train for 300 epochs. The learning rate scheduler reduced the learning rate by a factor of 10 at the 100th and 150th epochs. However, for RSLAD and IGDMRSLAD, the learning rate decreased by 10 at the 215th, 260th, and 285th epochs, as suggested in the original paper. For SVHN, we trained for 50 epochs with the learning rate decayed by a factor of 10 at the 40th and 45th epochs for all methods. For Tiny-Image Net, we trained for 100 epochs with the learning rate decayed by a factor of 10 at the 50th and 80th epochs for all methods.