FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts

Authors: Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results indicate that Fig Step can achieve an average attack success rate of 82.50% on six promising open-source LVLMs. Not merely to demonstrate the efficacy of Fig Step, we conduct comprehensive ablation studies and analyze the distribution of the semantic embeddings to uncover that the reason behind the success of Fig Step is the deficiency of safety alignment for visual embeddings.
Researcher Affiliation Academia 1Department of Computer Science and Technology, Tsinghua University, 2Institute for Network Sciences and Cyberspace, Tsinghua University, 3Institute for Advanced Study, BNRist, Tsinghua University, 4Carnegie Mellon University, 5Zhongguancun Laboratory, 6National Financial Cryptography Research Center, 7Shandong Institute of Blockchain, 8School of Cyber Science and Technology, Shandong University EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the pipeline of Fig Step in three steps (Paraphrase, Typography, and Incitement) in paragraph text and uses Figure 2 for illustration. It does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code, Datasets https://github.com/Thu CCSLab/Fig Step
Open Datasets Yes Code, Datasets https://github.com/Thu CCSLab/Fig Step
Dataset Splits No The paper introduces Safe Bench and Safe Bench-Tiny datasets for evaluation, but it does not specify any training/test/validation splits for these datasets for the experiments conducted. The evaluation involves repeatedly launching Fig Step for each question, not splitting the dataset for model training.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using 'Easy OCR (AI 2023)' and 'GPT-2 to calculate the PPL' but does not provide specific version numbers for these or any other software dependencies crucial for replication.
Experiment Setup Yes The default malicious image-prompt I of Fig Step is a typography of T that contains black text and a white background. The image size of I is 760 x 760. The text font is Free Mono Bold and the font size is 80. As for the jailbreaking incitement text-prompt, we use a manually designed inciting prompt as our default T to launch Fig Step. [...] Furthermore, considering the stochastic nature of the model s replies, we repeatedly launch Fig Step five times for each question, and one jailbreak could be deemed successful if any one of five attempts could yield a prohibited response.