Long-tailed Adversarial Training with Self-Distillation

Authors: Seungju Cho, Hongsin Lee, Changick Kim

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate state-of-the-art performance in both clean and robust accuracy for long-tailed adversarial robustness, with significant improvements in tail class performance on various datasets. We improve the accuracy against PGD attacks for tail classes by 20.3, 7.1, and 3.8 percentage points on CIFAR-10, CIFAR-100, and Tiny Image Net, respectively, while achieving the highest robust accuracy.
Researcher Affiliation Academia Seungju Cho , Hongsin Lee , Changick Kim Korea Advanced Institute of Science and Technology (KAIST) EMAIL
Pseudocode Yes Algorithm 1 Main Algorithm
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Dataset. We conducted experiments using long-tailed distribution datasets: CIFAR-10-LT, CIFAR100-LT (Krizhevsky et al., 2009), and Tiny-Image Net-LT (Le & Yang, 2015), with various imbalance ratios (IR), primarily set at 50 for CIFAR-10-LT, 10 for CIFAR-100-LT and Tiny-Image Net-LT.
Dataset Splits No The paper mentions using long-tailed versions of CIFAR-10, CIFAR-100, and Tiny-Image Net datasets, and discusses 'Test Accuracy' and 'Tail clean accuracy', implying the existence of a test set. However, it does not explicitly provide the specific training, validation, and test split percentages or sample counts for these datasets, nor does it reference standard splits with specific citations or file names.
Hardware Specification No The paper does not contain any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies No The paper mentions model architectures (Res Net-18, Wide Res Net-34-10, Pre Act Res Net-18) and adversarial training methods (PGD-AT, TRADES, MART, AWP), but it does not specify any software libraries or frameworks with their version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python 3.x).
Experiment Setup Yes Training details. We employed Res Net-18 (He et al., 2016a) and Wide Res Net-34-10 (Zagoruyko & Komodakis, 2016) architectures for CIFAR-10/100-LT, and results for Wide Res Net-34-10 are included in the appendix. For Tiny-Image Net-LT, we employed Pre Act Res Net-18 (He et al., 2016b). Initially, we trained a balanced self-teacher using the same model architecture for 30 epochs using a batch size of 32 with a balanced dataset, resampled by the original long-tailed dataset with γ = IR/2. In the main training phase, we trained for 100 epoch using a batch size of 128 with selfdistillation from the balanced self-teacher. We utilized SGD optimization to train both the balanced self-teacher and the main model, setting the learning rate to 0.1 and weight decay to 5 10 4. We used an epsilon boundary of 8/255, a commonly used setting in adversarial training, and employed a 10-step PGD attack during training.