Attention-Imperceptible Backdoor Attacks on Vision Transformers

Authors: Zhishen Wang, Rui Wang, Lihua Jing

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the effectiveness of the proposed AIBA across multiple datasets and Vi T benchmarks and explored the robustness of AIBA against current Vi T-specific defense methods. The experimental results demonstrate that our backdoor attack method can successfully implant a powerful and stealthy backdoor into Vi Ts.
Researcher Affiliation Academia Zhishen Wang1,2, Rui Wang1,2, Lihua Jing1,2* 1 Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 2 School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China EMAIL, EMAIL, EMAIL
Pseudocode No The specific algorithmic process for backdoor learning in AIBA is provided in the supplementary materials.
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available or provided in supplementary materials.
Open Datasets Yes To evaluate the effectiveness of the proposed AIBA, we conducted backdoor experiments using various Vi T models on two common image classification datasets: Image Net (Russakovsky et al. 2015) and CIFAR-10 (Krizhevsky, Hinton et al. 2009).
Dataset Splits Yes Image Net is a benchmark dataset in computer vision, comprising 1.28 million training images, 50,000 validation images across 1,000 class labels. CIFAR-10, on the other hand, includes a total of 60,000 color images categorized into 10 classes, with 50,000 designated for training and 10,000 for testing. ... We construct our mixed poisoned dataset with a poisoning rate of ρ = 0.1
Hardware Specification No The paper discusses the models and datasets used but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper describes the methodology and experimental setup but does not list specific software dependencies with version numbers.
Experiment Setup Yes we resize the input images to dimensions of 3 224 224 and set the patch size to 16, generating Np = 196 patches. ... In the Attention-Imperceptible Trigger Generation process, we select the top Nm = 24 patches that capture the model s highest attention to embed the imperceptible trigger. During trigger generation, We use l to constrain the trigger and limite the perturbation range to ϵ = 4/255 to maintain stealthiness. The learning rate for trigger optimization lrp is set at 0.01 and the Ninjection is set to 1. In the model poisoning process, we finetune the Vi Ts for 1 epoch with a batch size of 64 and a learning rate of 1e-5.