Explaining the role of Intrinsic Dimensionality in Adversarial Training

Authors: Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, Sanjay Chawla

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate SMAAT across multiple tasks, including text generation, sentiment classification, safety filtering, and retrieval augmented generation setups, demonstrating superior robustness with comparable generalization to standard training. 6. Experiments
Researcher Affiliation Academia 1Qatar Computing Research Institute, HBKU, Doha, Qatar 2Faculty of Computer Science,Dalhousie University, Halifax, Canada. Correspondence to: Enes Altinisik <EMAIL>.
Pseudocode Yes Algorithm 1 SMAAT
Open Source Code Yes 1The code is publicly available at: https://github.com/EnesAltinisik/SMAAT-25/tree/main
Open Datasets Yes AGNEWS (Zhang et al., 2015), IMDB (Maas et al., 2011), and YELP (Zhang et al., 2015) datasets... LAT dataset (Sheshadri et al., 2024)... MT-Bench (Zheng et al., 2024)... Adv Bench dataset (Zou et al., 2023) and the Helpfulness Harmfulness dataset (HH-RLHF) (Bai et al., 2022)... Natural Questions (NQ) (Kwiatkowski et al., 2019)... Ultra Chat dataset (Ding et al., 2023)... Harm Bench (Mazeika et al., 2024)... GLUE and Adv GLUE benchmarks
Dataset Splits Yes For testing, we use a subset of 1000 test samples from each dataset, following previous work practices... In addition, we randomly sample 10% of the training set for validation in all datasets. For the YELP dataset, we created a fine-tuned Ro BERTa model for 2 epochs with a learning rate of 1e 05 and a batch size of 32.
Hardware Specification Yes In our evaluation, we use a V100 GPU with 32 GB memory and 64 CPUs.
Software Dependencies No The paper mentions using the Text Attack framework (Morris et al., 2020) and Neuro X (Dalvi et al., 2023) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes To train the last layer of fθ with adversarial samples, we create adversarial samples using 5-step PGD attacks. During training, we use epsilon values of 0.1, 0.1, and 0.8 for the YELP, AGNEWS, and IMDB datasets, respectively, for the BERT models. For the Ro BERTa models, we employ epsilon values of 0.1, 0.6, and 0.03. All models are trained 10 epochs with a learning rate of 0.1. The model is trained with a learning rate of 2e 4, applying LAT at every even-numbered layer with norm bounds ranging from 1 to 5. In the case of SMAAT, we conducted a grid search for the learning rate, ranging from 0.1 to 0.001, and the ϵ value, ranging from 0.8 to 0.01, using 3-PGD steps. In all cases, standard models are trained over 5 epochs with a learning rate of 1e 5. Table 6 details the training hyperparameters for SMAAT.