Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

Authors: Jian-Ping Mei, Weibin Zhang, Jie Chen, Xuyun Zhang, Tiantian Zhu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings. We have conducted extensive empirical evaluations with various settings to verify its detection performance, defense capability for protecting image classification models from model stealing, as well as robustness to adaptive attack.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, 310023 China 2School of Computing, Macquarie University, 4 Research Park Drive, Macquarie Park, NSW 2109 EMAIL, cn, EMAIL, EMAIL, EMAIL.
Pseudocode No The paper describes methods using mathematical formulations (e.g., Equation 2 for MSADD(Q)) and conceptual diagrams (Figure 1), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/AI-EXP-group/D-ADD
Open Datasets Yes following (Kariyappa, Prakash, and Qureshi 2021b), we have trained five target models on well-known image datasets, namely MNIST, Fashion MNIST, CIFAR-10, CIFAR-100, and Flower-17 for image classification
Dataset Splits No The paper mentions using a 'testing set' for benign queries and describes specific ratios for mixing benign-looking samples in an adaptive attack scenario (e.g., 'maximum percentage of benign-looking samples is set to be 10% for CIFAR-10, CIFAR-100 and Flower17, and 1% for other two simple tasks'). However, it does not explicitly provide the standard training, validation, and testing splits for the primary datasets used in the main model training, which are typically required for full reproducibility.
Hardware Specification No The paper states 'In our experiment, the response latency caused by defense is less than 5ms per query,' but it does not specify any details about the CPU, GPU, or other hardware components used for running the experiments.
Software Dependencies No The paper mentions that 'The clone model is trained uding the SGD optimizer for 50 epochs with a cosine annealing schedule and an initial learning rate of 0.1,' which describes training settings, but it does not list any specific software or library names with version numbers (e.g., Python, TensorFlow, PyTorch versions) used in the implementation.
Experiment Setup Yes Hyperparameter settings. We set the training accuracy dropping ratio γ to 10 4, the sliding window size N to 256 for CIFAR-100 and 64 for all others. Without loss of generality, we consider each user query contains one image. The clone model is trained uding the SGD optimizer for 50 epochs with a cosine annealing schedule and an initial learning rate of 0.1.