HoneypotNet: Backdoor Attacks Against Model Extraction

Authors: Yixu Wang, Tianle Gu, Yan Teng, Yingchun Wang, Xingjun Ma

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate on four commonly used benchmark datasets that Honeypot Net can inject backdoors into substitute models with a high success rate. ... Experiments on four commonly used datasets show that our Honeypot Net defense achieves attack success rates between 56.99% and 92.35% on substitute models. ... Table 2 presents the results with a query budget of 30,000. ... We employ three metrics: Clean Test Accuracy (Accc), Verification Test Accuracy (Accv), and Attack Success Rate (ASR).
Researcher Affiliation Academia Yixu Wang1, 2*, Tianle Gu2, 3, Yan Teng2 , Yingchun Wang2, Xingjun Ma1, 2 1Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University 2Shanghai Artificial Intelligence Laboratory 3Tsinghua University Corresponding authors: <EMAIL, EMAIL>
Pseudocode Yes Algorithm 1: Honeypot Net Input: Victim model F, Shadow dataset Ds. Output: Honeypot Layer H, trigger δ. 1: Initialize H, Fs, and δ. 2: for epoch in o do 3: Select i samples from Ds and query H 4: for epoch in o do Extraction simulation 5: L = P x Ds L (Fs(x), H(x)) 6: Fs update(Fs, L) 7: end for 8: for epoch in o do Trigger generation 9: Update δ according to Eq. (6) 10: end for 11: for epoch in o do Finetuning 12: Calculate the loss L according to Eq. (5) 13: H update(H, L) 14: end for 15: end for
Open Source Code No The paper does not provide any specific link to a source code repository, nor does it explicitly state that the code will be made publicly available in supplementary materials or otherwise.
Open Datasets Yes Victim and Shadow Models Following previous works (Orekondy, Schiele, and Fritz 2019a,b), the victim models we consider are Res Net34 (He et al. 2016) models trained on four datasets: CIFAR10, CIFAR100 (Krizhevsky, Hinton et al. 2009), Caltech256 (Griffin, Holub, and Perona 2007), and CUBS200 (Wah et al. 2011). ... Attack and Shadow Datasets We chose the Image Net (Russakovsky et al. 2015) dataset as the attack dataset, which contains 1.2M images. ... For the shadow dataset, we randomly select 5,000 images from the CC3M (Sharma et al. 2018) dataset due to its distinct distribution from Image Net.
Dataset Splits Yes Dtest is the victim model s test set... Victim and Shadow Models ... models trained on four datasets: CIFAR10, CIFAR100 (Krizhevsky, Hinton et al. 2009), Caltech256 (Griffin, Holub, and Perona 2007), and CUBS200 (Wah et al. 2011). Attack and Shadow Datasets We chose the Image Net (Russakovsky et al. 2015) dataset as the attack dataset... For the shadow dataset, we randomly select 5,000 images from the CC3M (Sharma et al. 2018) dataset...
Hardware Specification No The paper does not provide specific details regarding the hardware used for running its experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions optimizers like SGD but does not provide specific software dependencies (e.g., Python, PyTorch, TensorFlow) along with their version numbers.
Experiment Setup Yes Training and Extraction Configuration We perform BLO for 30 iterations, with each iteration comprising three steps: (1) Extraction Simulation: We train a Res Net18 for 5 epochs using SGD (momentum 0.9, learning rate 0.1, cosine annealing) on a transfer set generated by querying the honeypot layer. (2) Trigger Generation: We update the trigger for 5 epochs with momentum 0.9. (3) Finetuning: We finetune the honeypot layer for 5 epochs using SGD (momentum 0.9, learning rate 0.02, cosine annealing). For models with a small input image size (CIFAR10 and CIFAR100), we select a 6x6 square located 4 pixels away from the upperleft corner as the trigger location. For models with a larger input image size (Caltech256 and CUBS200), we choose a 28x28 square trigger at the same location. For simplicity, the last class is designated as the target class. Following previous work (Orekondy, Schiele, and Fritz 2019a,b; Pal et al. 2020), we train substitute models for 200 epochs using SGD (momentum 0.9, learning rate 0.02, cosine annealing).