Calibration Attacks: A Comprehensive Study of Adversarial Attacks on Model Confidence
Authors: Stephen Obadinma, Xiaodan Zhu, Hongyu Guo
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose four typical forms of calibration attacks: underconfidence, overconfidence, maximum miscalibration, and random confidence attacks, conducted in both black-box and white-box setups. We demonstrate that the attacks are highly effective on both convolutional and attention-based models... We further investigate the effectiveness of a wide range of adversarial defence and recalibration methods... From the ECE and KS scores, we observe that there are still significant limitations... Section 4 Experiments |
| Researcher Affiliation | Academia | Stephen Obadinma EMAIL Department of Electrical and Computer Engineering & Ingenuity Labs Research Institute, Queen s University Xiaodan Zhu EMAIL Department of Electrical and Computer Engineering & Ingenuity Labs Research Institute, Queen s University Hongyu Guo EMAIL Digital Technologies Research Centre, National Research Council Canada |
| Pseudocode | Yes | Algorithm 1 A Brief Overview of Our Calibration Attack Framework |
| Open Source Code | Yes | Our code is available at https://github.com/Phenet Os/Calibration Attack |
| Open Datasets | Yes | We performed a comprehensive study on CIFAR-100 (Krizhevsky & Hinton, 2009) and Caltech101 (Fei-Fei et al., 2004). We also included the German Traffic Sign Recognition Benchmark (GTSRB) (Houben et al., 2013) |
| Dataset Splits | Yes | For CIFAR-100 and GTSRB, we use the predefined training and test sets for both but use 10% of the training data for validation purposes. For Caltech-101, which comes without predetermined splits, we use an 80:10:10 train/validation/test split. |
| Hardware Specification | Yes | All of the training occurred on 24 GB Nvidia RTX-3090 and RTX Titan GPUs. |
| Software Dependencies | No | We use the Foolbox implementation of the PGD attack (Rauber et al., 2020; 2017). |
| Experiment Setup | Yes | The hyperparameters we used for training the Res Net-50 models include: a batch size of 128, with a Cosine Annealing LR scheduler, 0.9 momentum, 5e-4 weight decay, and a stochastic gradient descent (SGD) optimizer. For Vi T, the settings are the same, except we also use gradient clipping with the max norm set to 1.0. We conduct basic grid search hyperparameter tuning over a few values for the learning rate (0.1,0.01,0.005,0.001) and training duration (in terms of epochs). Generally, we found that a learning rate of 0.01 worked best for both types of models. The training times vary for each dataset and model. For the Res Net-50 models we trained for 15 epochs on CIFAR-100, 10 epochs on Caltech-101, and 7 epochs on GTSRB. Likewise for Vi T, we trained for 10 epochs on CIFAR-100, 15 epochs on Caltech-101, and 5 epochs on GTSRB. |