Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Defending LVLMs Against Vision Attacks Through Partial-Perception Supervision

Authors: Qi Zhou, Dongxia Wang, Tianlin Li, Yun Lin, Yang Liu, Jin Song Dong, Qing Guo

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical experiments show our method outperforms the baseline, cutting the average attack success rate by 76.3% across six datasets on three popular models. Our code is available at https: //github.com/tools-only/DPS
Researcher Affiliation Collaboration 1College of Control Science and Engineering, Zhejiang University, China 2Huzhou Institute of Industrial Control Technology, China 3School of Computer Science and Engineering, Nanyang Technological University, Singapore 4School of Computer Science, Shanghai Jiao Tong University, China 5School of Computing, National University of Singapore, Singapore 6 IHPC and CFAR, Agency for Science, Technology, and Research, Singapore. Correspondence to: Dongxia Wang <EMAIL>, Tianlin Li <EMAIL>.
Pseudocode No The paper only describes the method steps in prose in Section 4.2 'Detailed Design of DPS' without a structured pseudocode block or algorithm figure.
Open Source Code Yes Our code is available at https: //github.com/tools-only/DPS
Open Datasets Yes To comprehensively evaluate the performance of different defense methods, we considered various datasets with a range of attack types, which include the following datasets: Challenging Misleading Datasets: RTA-100 (Azuma & Matsui, 2023) and Multi Trust Misleading Dataset (Zhang et al., 2024b). Misleading Attack Datasets: Self-Gen dataset constructed by self-generated typographic attacks (Qraitem et al., 2024). Typographical Jailbreak Datasets: MM-Safety Bench (Liu et al., 2025) and HADES (Li et al., 2024d). Optimization-based Jailbreak Adversarial Examples: We utilize the approach from Visual Attack (Qi et al., 2024), which inject safety-aware adversarial noise into clean images. In addition, we utilize the MM-Vet benchmark (Yu et al., 2023) to evaluate the standard performance of various defense methods in general scenarios. Please refer to Appendix A for more details.
Dataset Splits No The paper mentions using a 'MM-Vet test dataset Dvet' and filtering out 'adversarial samples that successfully attacked the original model across all datasets' to create 'six adversarial sample datasets for evaluation'. However, it does not provide specific details on the train/validation/test splits of the original datasets or the methodology for partitioning the data into these adversarial sample datasets beyond filtering.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions models like Qwen-VL-Plus, GPT-4o-Mini, Gemini-1.5-Flash, and Qwen2.5-VL-32B, and refers to GPT-4o for evaluation, but does not specify software dependencies (libraries, frameworks, or solvers) with version numbers used for implementation or training.
Experiment Setup Yes For Smooth VLM, we set the perturbation rate at 20%, which performs best in its original paper and uses 10 LVLMs for majority voting. For DPS and LS-DPS, we generated three partial image copies using center-cropping, random-cropping, and adaptive cropping strategies. Center-cropping extracts a half-size image from the center of the original, random-cropping extracts 1/4 to 1/2 size images from random locations, and adaptive cropping employs LVLM to extract the main objects from the image. Furthermore, we choose random-cropping as the standard method to provide more detailed comparisons, referred to as Standard.