Explaining the role of Intrinsic Dimensionality in Adversarial Training
Authors: Enes Altinisik, Safa Messaoud, Husrev Taha Sencar, Hassan Sajjad, Sanjay Chawla
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate SMAAT across multiple tasks, including text generation, sentiment classification, safety filtering, and retrieval augmented generation setups, demonstrating superior robustness with comparable generalization to standard training. 6. Experiments |
| Researcher Affiliation | Academia | 1Qatar Computing Research Institute, HBKU, Doha, Qatar 2Faculty of Computer Science,Dalhousie University, Halifax, Canada. Correspondence to: Enes Altinisik <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 SMAAT |
| Open Source Code | Yes | 1The code is publicly available at: https://github.com/EnesAltinisik/SMAAT-25/tree/main |
| Open Datasets | Yes | AGNEWS (Zhang et al., 2015), IMDB (Maas et al., 2011), and YELP (Zhang et al., 2015) datasets... LAT dataset (Sheshadri et al., 2024)... MT-Bench (Zheng et al., 2024)... Adv Bench dataset (Zou et al., 2023) and the Helpfulness Harmfulness dataset (HH-RLHF) (Bai et al., 2022)... Natural Questions (NQ) (Kwiatkowski et al., 2019)... Ultra Chat dataset (Ding et al., 2023)... Harm Bench (Mazeika et al., 2024)... GLUE and Adv GLUE benchmarks |
| Dataset Splits | Yes | For testing, we use a subset of 1000 test samples from each dataset, following previous work practices... In addition, we randomly sample 10% of the training set for validation in all datasets. For the YELP dataset, we created a fine-tuned Ro BERTa model for 2 epochs with a learning rate of 1e 05 and a batch size of 32. |
| Hardware Specification | Yes | In our evaluation, we use a V100 GPU with 32 GB memory and 64 CPUs. |
| Software Dependencies | No | The paper mentions using the Text Attack framework (Morris et al., 2020) and Neuro X (Dalvi et al., 2023) but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To train the last layer of fθ with adversarial samples, we create adversarial samples using 5-step PGD attacks. During training, we use epsilon values of 0.1, 0.1, and 0.8 for the YELP, AGNEWS, and IMDB datasets, respectively, for the BERT models. For the Ro BERTa models, we employ epsilon values of 0.1, 0.6, and 0.03. All models are trained 10 epochs with a learning rate of 0.1. The model is trained with a learning rate of 2e 4, applying LAT at every even-numbered layer with norm bounds ranging from 1 to 5. In the case of SMAAT, we conducted a grid search for the learning rate, ranging from 0.1 to 0.001, and the ϵ value, ranging from 0.8 to 0.01, using 3-PGD steps. In all cases, standard models are trained over 5 epochs with a learning rate of 1e 5. Table 6 details the training hyperparameters for SMAAT. |