reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Authors: Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results indicate that Fig Step can achieve an average attack success rate of 82.50% on six promising open-source LVLMs. Not merely to demonstrate the efficacy of Fig Step, we conduct comprehensive ablation studies and analyze the distribution of the semantic embeddings to uncover that the reason behind the success of Fig Step is the deficiency of safety alignment for visual embeddings.
Researcher Affiliation	Academia	1Department of Computer Science and Technology, Tsinghua University, 2Institute for Network Sciences and Cyberspace, Tsinghua University, 3Institute for Advanced Study, BNRist, Tsinghua University, 4Carnegie Mellon University, 5Zhongguancun Laboratory, 6National Financial Cryptography Research Center, 7Shandong Institute of Blockchain, 8School of Cyber Science and Technology, Shandong University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the pipeline of Fig Step in three steps (Paraphrase, Typography, and Incitement) in paragraph text and uses Figure 2 for illustration. It does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code, Datasets https://github.com/Thu CCSLab/Fig Step
Open Datasets	Yes	Code, Datasets https://github.com/Thu CCSLab/Fig Step
Dataset Splits	No	The paper introduces Safe Bench and Safe Bench-Tiny datasets for evaluation, but it does not specify any training/test/validation splits for these datasets for the experiments conducted. The evaluation involves repeatedly launching Fig Step for each question, not splitting the dataset for model training.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Easy OCR (AI 2023)' and 'GPT-2 to calculate the PPL' but does not provide specific version numbers for these or any other software dependencies crucial for replication.
Experiment Setup	Yes	The default malicious image-prompt I of Fig Step is a typography of T that contains black text and a white background. The image size of I is 760 x 760. The text font is Free Mono Bold and the font size is 80. As for the jailbreaking incitement text-prompt, we use a manually designed inciting prompt as our default T to launch Fig Step. [...] Furthermore, considering the stochastic nature of the model s replies, we repeatedly launch Fig Step five times for each question, and one jailbreak could be deemed successful if any one of five attempts could yield a prohibited response.