Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

Authors: Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To study the adversarial robustness of bnns, we investigate whether it is possible to successfully break state-of-the-art bnn inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that bnns trained with state-of-the-art approximate inference methods, and even bnns trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. ... We conduct thorough evaluations of bnns trained with well-established and state-of-the-art approximate inference methods (hmc, Neal (2010); psvi, Blundell et al. (2015); mcd, Gal and Ghahramani (2016); fsvi, Rudner et al. (2022b)) on benchmarking tasks such as MNIST, Fashion MNIST, and CIFAR-10.
Researcher Affiliation Academia Yunzhen Feng EMAIL New York University Tim G. J. Rudner EMAIL New York University Nikolaos Tsilivis EMAIL New York University Julia Kempe EMAIL New York University
Pseudocode No The paper describes methods in textual form and through mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks (e.g., sections titled 'Pseudocode' or 'Algorithm').
Open Source Code Yes Reproducibility. Code to reproduce our results can be found at https://github.com/timrudner/attacking-bayes
Open Datasets Yes We conduct thorough evaluations of bnns trained with well-established and state-of-the-art approximate inference methods (hmc, Neal (2010); psvi, Blundell et al. (2015); mcd, Gal and Ghahramani (2016); fsvi, Rudner et al. (2022b)) on benchmarking tasks such as MNIST, Fashion MNIST, and CIFAR-10. ... Our semantic shift datasets for MNIST, Fashion MNIST and CIFAR-10 are Fashion MNIST, MNIST, and SVHN, respectively, each of them giving zero accuracy.
Dataset Splits Yes We evaluate all AE detectors on test data consisting of 50% clean samples and 50% adversarially perturbed samples, using total uncertainty for the rejection as described in Section 2.2. ... Our semantic shift datasets for MNIST, Fashion MNIST and CIFAR-10 are Fashion MNIST, MNIST, and SVHN, respectively, each of them giving zero accuracy. The test set contains half in-distribution (ID) and half semantically-shifted out-of-distribution (OOD) samples, hence selective accuracy curves start at 50% accuracy. ... To optimize GPU memory usage, we use 10,000 training and 5,000 validation samples from the MNIST dataset.
Hardware Specification No This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise. The paper mentions a general 'High Performance Computing resources' but does not provide specific hardware details such as GPU/CPU models, memory, or processor types.
Software Dependencies No We implement hmc using the Hamiltorch package from Cobb and Jalaian (2021)... The paper mentions the 'Hamiltorch package' and implies the use of frameworks like PyTorch (torch.nn.CrossEntropyLoss) and TensorFlow/Keras (tf.nn.sparse_softmax_cross_entropy_with_logits, keras.models.Model), but it does not specify version numbers for any of these software components.
Experiment Setup Yes All hyperparameter details can be found in Appendix E. ... The hyperparameters for fsvi, psvi, and mcd are shown in Table 7, Table 8, and Table 9. ... For hmc, we train the model for 20 steps with 0.001 as the step size. ... Tables 7, 8, and 9 provide specific values for 'Prior Var', 'Prior Mean', 'Epochs', 'Batch Size', 'Context Batch Size', 'Learning Rate', 'Momentum', 'Weight Decay', 'Alpha', 'Reg Scale', and 'Dropout Rate' for different methods and datasets.