Attack on Prompt: Backdoor Attack in Prompt-Based Continual Learning
Authors: Trang Nguyen, Anh Tran, Nhat Ho
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across various benchmark datasets and continual learners validate our continual backdoor framework, with further ablation studies confirming our contributions effectiveness. In this section, we first describe the experimental setup, then present the results in five key aspects: the overall backdooring ability of AOP, its performance with different surrogate datasets, the robustness of AOP with varying attack times, the efficacy of adopting BCE in preventing the generation of adversarial perturbations and its effectiveness compared to baselines. Metrics: The evaluation of our framework utilizes two key metrics: (1) average accuracy (ACC) and (2) attack success rate (ASR). ACC assesses the accuracy of the backdoored model on benign test samples, whereas ASR measures the proportion of attacked samples that the compromised model predicts as the target label, reflecting the backdoor attack s effectiveness. All results are averaged over 3 runs for fair comparisons. |
| Researcher Affiliation | Collaboration | Trang Nguyen1, Anh Tran1, Nhat Ho2 1Vin AI Research 2The University of Texas at Austin EMAIL, EMAIL, EMAIL |
| Pseudocode | No | A comprehensive overview and the end-to-end algorithm is in the supplementary material. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code or a link to a code repository. It only mentions that further details about the algorithm are in the supplementary material, but not specifically the code itself. |
| Open Datasets | Yes | All learners utilize the Vi T-B/16 backbone (Dosovitskiy et al. 2021), pre-trained on Image Net-1K (Russakovsky et al. 2015), except for Hi De-Prompt, which is pretrained on i BOT-1K (Zhou et al. 2022). Datasets. For the victim s training dataset, we follow existing prompt-based continual learning methods (Wang et al. 2023; Qiao et al. 2024) and use three variants of Image Net R (Hendrycks et al. 2020): 5-Split, 10-Split, and 20-Split Image Net-R. Additionally, we conduct experiments on the 5-Split-CUB200 dataset, which partitions the original CUB200 (Wah et al. 2011) dataset into 5 tasks, each containing 40 classes. For the attacker s surrogate dataset, we primarily use Tiny Image Net (Le and Yang 2015) for all experiments and CIFAR100 (Krizhevsky 2009) in specific settings. |
| Dataset Splits | Yes | For the victim s training dataset, we follow existing prompt-based continual learning methods (Wang et al. 2023; Qiao et al. 2024) and use three variants of Image Net R (Hendrycks et al. 2020): 5-Split, 10-Split, and 20-Split Image Net-R. These variants divide the 200 classes of the original dataset into 5, 10, and 20 tasks, respectively. Additionally, we conduct experiments on the 5-Split-CUB200 dataset, which partitions the original CUB200 (Wah et al. 2011) dataset into 5 tasks, each containing 40 classes. |
| Hardware Specification | No | The paper mentions using a 'Vi T-B/16 backbone' which is a model architecture, and 'pre-trained on Image Net-1K', but does not specify the hardware (e.g., specific GPU models, CPUs, or TPUs) used for training or experimentation. |
| Software Dependencies | No | The paper mentions model architectures and pre-trained models like 'Vi T-B/16 backbone' and 'i BOT-1K', but it does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Following the guidelines of (Zeng et al. 2022), we set the maximum poison ratio to 25 images, corresponding to 0.1% of Image Net-R and 0.5% of CUB200. Additionally, we set the upper bound of the ℓ norm of triggers to 16/255, in line with standard practices in the literature (Turner, Tsipras, and Madry 2019; Saha, Subramanya, and Pirsiavash 2019). During inference, the trigger is amplified by a factor of 3 (Turner, Tsipras, and Madry 2019; Zeng et al. 2022). All results are averaged over 3 runs for fair comparisons. The dynamic stage takes place over 5 rounds. The dynamic stage is iterated for 10 rounds. |