Dynamic Neural Fortresses: An Adaptive Shield for Model Extraction Defense
Authors: Siyu Luan, Zhenyi Wang, Li Shen, Zonghua Gu, Chao Wu, Dacheng Tao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess the efficacy of the proposed method, we conduct comprehensive experiments aimed at defending against both data-based model extraction (where the attacker uses similar real data to query the victim model) and data-free model extraction (where the attacker uses synthetic data only to query the victim model) in both soft-label and hard-label attack settings. Our method is highly adaptable and can be applied seamlessly to various victim model architectures, including both Res Net and large-scale pre-trained Vision Transformer (Vi T) models. Its simple design and ease of implementation, using standard linear classifiers without requiring specialized exit classifiers, highlight its exceptional scalability and flexibility. The results demonstrate that our approach consistently surpasses the state-of-the-art (SOTA) defense method, achieving up to a 12% reduction in clone model accuracy. Simultaneously, our method significantly enhances running efficiency compared to SOTA defense methods, achieving 2 speedup. Additionally, our method outperforms other defense techniques in terms of overall model utility. Crucially, our defense is notably more effective in model extraction scenarios, regardless of whether attackers utilize OOD data or have access to in-distribution data, highlighting its broad applicability. Furthermore, we conducted additional experiments to evaluate our defense against the SOTA model architecture stealing method (Carlini et al., 2024). The results demonstrate that our approach can effectively protect model architecture from theft. |
| Researcher Affiliation | Academia | 1University of Copenhagen, Denmark 2University of Maryland, College Park, USA 3 Shenzhen Campus of Sun Yat-sen University 4Hofstra University, USA 5University at Buffalo, USA 6Nanyang Technological University, Singapore |
| Pseudocode | Yes | We summarize the joint training details in Algorithm 1 in Appendix. In line 3-4, we randomly sample ID and simulated OOD dataset. In line 5-8, we calculate the base DNF training loss by Eq. (6, 7, 8), respectively. Then, we update the Early-Exit neural network V via SGD optimizer with respect to the exit classifiers parameters δV. Deployment of DNF During testing, given the input query x and the array of confidence thresholds r = {r0, r1, .., r N}, where r0 represents the first confidence threshold for the first intermediate exit classifier, and so forth. Starting from the input x and progressing to the position of the first intermediate exit classifier in the model V , the first intermediate exit classifier calculates the softmax probabilities V 0 (x) across all prediction classes. We denote the probability of the jth class in the prediction at the first intermediate exit classifier as V 0,j(x). The largest probability value in V 0 (x) is denoted as V 0,max(x) = maxj V 0,j(x). If V 0,max r0, it signifies that the model V is confident in the current result, enabling the early termination of subsequent calculations. If V 0,max < r0, the inference process continues. We summarize the DNF testing algorithm in Algorithm 2 in Appendix. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a repository. |
| Open Datasets | Yes | Datasets (1) We evaluate the effectiveness of our method against DFME attack by using MNIST (10 classes) (Deng, 2012), CIFAR-10 (10 classes), CIFAR-100 datasets (100 classes) (Krizhevsky, 2009), and Image Net-100 (Vinyals et al., 2016) (100 classes), as these datasets are commonly used in existing DFME research. (2) For evaluating the effectiveness of our method against DBME, following Mazeika et al. (2022), we use Caltech256 (Griffin et al., 2007) as the query dataset for both Image Net200 (200 classes) and CUB200 (Wah et al., 2011) datasets trained victim models. |
| Dataset Splits | No | The paper lists popular datasets (MNIST, CIFAR-10/100, Image Net-100, Caltech256, Image Net200, CUB200) which usually come with predefined train/test splits. However, it does not explicitly state the proportions or counts used for its own experimental setup, nor does it explicitly state "we use the standard splits of X dataset" with specific details or citations to those splits. While query budgets are mentioned, these are not the dataset splits for training/validation/testing. |
| Hardware Specification | Yes | All experiments are run on a single NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions software like SMAC3 for Bayesian optimization and optimizers like SGD and Adam, but it does not provide specific version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In our experiments, following (Wang et al., 2023), in DFME attack, the l1 perturbation budget is set to 1.0, for the defense baselines in order to mount a strong defense against model extraction. This means that the l1 norm of the difference between y and ˆy, where y represents the original output probabilities and ˆy represents the modified output, does not exceed 1.0. In DFME attack setting, following (Truong et al., 2021), the query budget for different datasets we set is as follows, 2M for MNIST, 20M for CIFAR10, 200M for CIFAR100, 200M for Image Net-100. In DBME attack, the query budget is set to be 10K for Image Net200, 23K for CUB200. We report the results with a mean and standard deviation with five runs. All experiments are run on a single NVIDIA RTX A6000 GPU. Due to space limitations, we put the details of training and hyperparameter in Appendix D, exit classifier architecture in Appendix E, and OOD generator in F, respectively. Exit Threshold Selection Guideline: We propose a sample-efficient Bayesian optimization framework for automatically selecting the exit threshold. The detailed selection process is outlined in Appendix H. |