CAMH: Advancing Model Hijacking Attack in Machine Learning

Authors: Xing He, Jiahao Chen, Yuwen Pu, Qingming Li, Chunyi Zhou, Yingcai Wu, Jinbao Li, Shouling Ji

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CAMH across multiple benchmark datasets and network architectures, demonstrating its potent attack effectiveness while ensuring minimal degradation in the performance of the original task. We evaluate the effectiveness of CAMH on multiple benchmark tasks (including MNIST, SVHN, GTSRB, CIFAR10, and CIFAR100) across various network architectures (including Res Net18 and Res Net34). The experimental results demonstrate that CAMH successfully executes hijacking tasks, with the ER values exceeding 85% for most datasets and the CR maintained at approximately 98% for all datasets. To assess the impact of noise and SOL-based projections, we have undertaken ablation study. The findings detailed in Table 1 indicate that when the attacker employs only the optimized noise, the ER for the hijacking task escalates. When the synchronized layer is used exclusively, there is an improvement in performance. The hijacking performance of the model is notably enhanced when both strategies are combined, culminating in the highest ER.
Researcher Affiliation Academia Xing He1*, Jiahao Chen1*, Yuwen Pu1, Qingming Li1 , Chunyi Zhou1, Yingcai Wu1, Jinbao Li23, Shouling Ji1 1College of Computer Science and Technology, Zhejiang University 2Shandong Artificial Intelligence Institute 3School of Mathematics and Statistics, Qilu University of Technology
Pseudocode Yes Algorithm 1 in the Appendix shows the dual-loop optimization Training.
Open Source Code No The paper mentions Hugging Face and Model Zoo as marketplaces for pre-trained models and third-party services, but there is no explicit statement or link indicating that the authors have released their own source code for the CAMH methodology.
Open Datasets Yes We mainly use MNIST (Le Cun 1998), SVHN (Netzer et al. 2011), GTSRB (Stallkamp et al. 2011), CIFAR10 (Krizhevsky, Hinton et al. 2009) and CIFARm datasets for experiments. The specific descriptions of MNIST, SVHN, GTSRB, and CIFAR10 are shown in the Appendix. CIFARm is a dataset we defined ourselves. CIFARm denotes the datasets derived from CIFAR100 (Krizhevsky, Hinton et al. 2009).
Dataset Splits Yes We mainly use MNIST (Le Cun 1998), SVHN (Netzer et al. 2011), GTSRB (Stallkamp et al. 2011), CIFAR10 (Krizhevsky, Hinton et al. 2009) and CIFARm datasets for experiments. The specific descriptions of MNIST, SVHN, GTSRB, and CIFAR10 are shown in the Appendix. CIFARm is a dataset we defined ourselves. CIFARm denotes the datasets derived from CIFAR100 (Krizhevsky, Hinton et al. 2009). These datasets are formed by randomly selecting m out of the 100 classes available in CIFAR100. The CIFAR100 dataset comprises 100 distinct classes, each class encompassing precisely 600 color images of 32 32 pixel resolution, amounting to a grand total of 60,000 images.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments. It only mentions the models and training parameters.
Software Dependencies No The paper mentions using Res Net18 and Res Net34 models, which are "original models from Torchvision," but it does not specify the version of Torchvision or any other software dependencies with version numbers.
Experiment Setup Yes For Res Net18, we employed 150 training epochs using the SGD optimizer with an initial learning rate of 0.1 and a batch size of 64. Considering the increased complexity and susceptibility to overfitting of the Res Net34 model, we introduced a dropout rate of 0.4 during training and extended the training epochs to 200, other settings remain the same as Res Net18. This configuration ensures the model achieves high accuracy on the original task while providing a stable foundation for subsequent hijacking attack experiments.