Robust SAM: On the Adversarial Robustness of Vision Foundation Models

Authors: Jiahuan Long, Zhengqin Xu, Tingsong Jiang, Wen Yao, Shuai Jia, Chao Ma, Xiaoqian Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our cross-prompt attack method outperforms previous approaches in terms of attack success rate on both SAM and SAM 2. By adapting only 512 parameters, we achieve at least a 15% improvement in mean intersection over union (m Io U) against various adversarial attacks. Compared to previous defense methods, our approach enhances the robustness of SAM while maximally maintaining its original performance.
Researcher Affiliation Academia 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, 2Defense Innovation Institute, Chinese Academy of Military Science, 3Intelligent Game and Decision Laboratory, 4Chinese Academy of Military Science, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodologies using text and mathematical formulations (e.g., equations for m Io U, Ladv, Ldef) and figures, but does not include a distinct 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not explicitly state that source code for the described methodology is released or provide a link to a code repository.
Open Datasets Yes To evaluate the robustness of SAM under different types of prompts (i.e., point and box prompts), we randomly sample 2000 images from the SA1B (Kirillov et al. 2023), VOC (Everingham et al. 2010), COCO (Lin et al. 2014), and DAVIS (Pont-Tuset et al. 2017) datasets.
Dataset Splits Yes The VOC dataset is split into 70% for training and 30 for evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions using the Adam optimizer (Kingma and Ba 2014) but does not specify version numbers for other key software components, libraries, or programming languages.
Experiment Setup Yes For the attack setting, we set the total number of iteration steps to 20, and perturbation intensity ϵ to 16/255 for PPA, BPA, and our cross-prompt attack. The attack feature A are set to the negative values of the key features, and K in TOPK function is set to 5. For all fewparameter adaptation methods, we randomly sample 70% of the adversarial examples in the VOC dataset for adapting SAM. Our training employs the Adam optimizer (Kingma and Ba 2014). The initial learning rate is set to 1.0 10 3, and the weight decay is 5 10 5 with one image per minibatch. The number of training epochs is set to 500.