Stealthy Backdoor Attack via Confidence-driven Sampling

Authors: Pengfei He, Yue Xing, Han Xu, Jie Ren, Yingqian Cui, Shenglai Zeng, Jiliang Tang, Makoto Yamada, Mohammad Sabokrou

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical insights and conduct extensive experiments to demonstrate the effectiveness of the proposed method.
Researcher Affiliation Academia Pengfei He EMAIL Department of Computer Science and Engineering, Michigan State University Yue Xing EMAIL Department of Statistics and Probability, Michigan State University Han Xu EMAIL Department of Electrical and Computer Engineering, University of Arizona Jie Ren EMAIL Department of Computer Science and Engineering, Michigan State University Yingqian Cui EMAIL Department of Computer Science and Engineering, Michigan State University Shenglai Zeng EMAIL Department of Computer Science and Engineering, Michigan State University Jiliang Tang EMAIL Department of Computer Science and Engineering, Michigan State University Makoto Yamada EMAIL Machine learning and data science (MLDS), Okinawa Institute of Science and Technology Mohammad Sabokrou EMAIL Machine learning and data science (MLDS), Okinawa Institute of Science and Technology
Pseudocode Yes The detailed algorithm is shown in Algorithm 1. Algorithm 1 CBS Input Clean training set Dtr = {(xi, yi)}N i=1, model f( ; θ), pre-train epochs E, threshold ϵ, target class yt Output Poisoned sample set U, poisoned label set Sp. Pre-train the surrogate model f on Dtr for T epochs and obtain f( ; θ) Initialize poisoned sample set U = {} for i = 1, ..., N do if |sc(f(xi; θ))yi sc(f(xi; θ))yt| ϵ then U = U {(xi, yi)} end if end for Return poisoned sample set U
Open Source Code Yes Code can be found in https://github.com/Pengfei He Power/boundary-backdoor.
Open Datasets Yes on CIFAR10 (Krizhevsky et al., 2009) image classification tasks on datasets Cifar10 and Cifar100 (Krizhevsky et al., 2009) Tiny-Image Net (Le & Yang, 2015) Image Net-1k (Russakovsky et al., 2015) GTSRB is a traffic sign recognition benchmark
Dataset Splits Yes For CBS, we set ϵ = 0.2 and the corresponding poison rate is 0.2% applied for Random and FUS, to guarantee that poisoning rates are the same for all sampling methods. We designate different subset rates (10%, 5%, 1%) of the training set as accessible to the attacker, who can only poison this fraction of the data.
Hardware Specification No No specific hardware details (GPU/CPU models, memory) used for running experiments are mentioned in the paper. It only references model architectures (Res Net18, VGG16).
Software Dependencies No No specific software versions (e.g., Python 3.x, PyTorch 1.x) are mentioned. The paper refers to model architectures and optimizers but not the underlying software environment with version numbers.
Experiment Setup Yes The surrogate model is trained on the clean training set via SGD for 60 epochs, with an initial learning rate of 0.01 and reduced by 0.1 after 30 and 50 epochs. We implement CBS according to Algorithm.1... For CBS, we set ϵ = 0.2 and the corresponding poison rate is 0.2% applied for Random and FUS... We retrain victim models on poisoned training data from scratch via SGD for 200 epochs with an initial learning rate of 0.1 and decay by 0.1 at epochs 100 and 150.