reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TRUST-VLM: Thorough Red-Teaming for Uncovering Safety Threats in Vision-Language Models

Authors: Kangjie Chen, Li Muyang, Guanlin Li, Shudong Zhang, Shangwei Guo, Tianwei Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that TRUST-VLM not only outperforms traditional red-teaming techniques in generating diverse and effective adversarial cases but also provides actionable insights for model improvement. To evaluate the effectiveness of TRUST-VLM, we conduct comprehensive experiments on four open-source models (LLa VA-v1.5-13B, Qwen2-VL-7B, Deep Seek-VL-7B, and Phi-3-Vision-128K) and a commercial model (GPT-4o) with six harmful categories.
Researcher Affiliation	Academia	1Digital Trust Center, Nanyang Technological University, Singapore; 2College of Computing and Data Science, Nanyang Technological University, Singapore; 3School of Computer Science and Technology, Xidian University, China; 4College of Computer Science, Chongqing University, China. Correspondence to: Shangwei Guo <EMAIL>.
Pseudocode	No	The paper describes the methodology in prose (Section 4) and provides ICL templates in Appendix F, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code, nor does it include a link to a code repository.
Open Datasets	Yes	To evaluate the effectiveness of TRUST-VLM, we conduct comprehensive experiments on four open-source models (LLa VA-v1.5-13B, Qwen2-VL-7B, Deep Seek-VL-7B, and Phi-3-Vision-128K) and a commercial model (GPT-4o) with six harmful categories. We compare it against three types of baselines: automatic red-teaming method (Arondight (Liu et al., 2024)), benchmark-based red-teaming methods (Jailbreak V-28K (Luo et al., 2024) and Red-teaming VLM (RTVLM) (Li et al., 2024c)) and jailbreak attack (HADES (Li et al., 2024d)).
Dataset Splits	No	The paper describes the generation of test cases for evaluating target VLMs (e.g., 'We generate 200 test cases for each red-teaming method on LLa VA-v1.5-13B'), but it does not specify traditional training/test/validation dataset splits for a model being developed or trained within the paper.
Hardware Specification	Yes	During our experiments, we use 4 A6000 to launch the red-teaming pipeline.
Software Dependencies	No	The paper mentions specific models like 'Llama-3.1-70B-Instruct (Meta, 2024)', 'Stable Diffusion 3 Medium (Esser et al., 2024)', and 'BART-Large-MNLI (Facebook, 2024)', but it does not provide specific version numbers for general software dependencies such as programming languages, libraries (e.g., PyTorch, TensorFlow), or other frameworks.
Experiment Setup	Yes	We introduce the detailed inference settings for VLMs and the red-teaming model in our framework in Table 8 and Table 9, respectively. A threshold parameter ϵ is introduced to filter out less confident results and prioritize reliable predictions. In the TRUST-VLM framework, this threshold is set to 0.75 by default. In our experiments, we use t = 2 by default to balance the information abundance and the implication of the harmful concept. For each aforementioned category, TRUST-VLM performs 50 rounds to generate test cases.