FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Authors: Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results indicate that Fig Step can achieve an average attack success rate of 82.50% on six promising open-source LVLMs. Not merely to demonstrate the efficacy of Fig Step, we conduct comprehensive ablation studies and analyze the distribution of the semantic embeddings to uncover that the reason behind the success of Fig Step is the deficiency of safety alignment for visual embeddings. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University, 2Institute for Network Sciences and Cyberspace, Tsinghua University, 3Institute for Advanced Study, BNRist, Tsinghua University, 4Carnegie Mellon University, 5Zhongguancun Laboratory, 6National Financial Cryptography Research Center, 7Shandong Institute of Blockchain, 8School of Cyber Science and Technology, Shandong University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the pipeline of Fig Step in three steps (Paraphrase, Typography, and Incitement) in paragraph text and uses Figure 2 for illustration. It does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code, Datasets https://github.com/Thu CCSLab/Fig Step |
| Open Datasets | Yes | Code, Datasets https://github.com/Thu CCSLab/Fig Step |
| Dataset Splits | No | The paper introduces Safe Bench and Safe Bench-Tiny datasets for evaluation, but it does not specify any training/test/validation splits for these datasets for the experiments conducted. The evaluation involves repeatedly launching Fig Step for each question, not splitting the dataset for model training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Easy OCR (AI 2023)' and 'GPT-2 to calculate the PPL' but does not provide specific version numbers for these or any other software dependencies crucial for replication. |
| Experiment Setup | Yes | The default malicious image-prompt I of Fig Step is a typography of T that contains black text and a white background. The image size of I is 760 x 760. The text font is Free Mono Bold and the font size is 80. As for the jailbreaking incitement text-prompt, we use a manually designed inciting prompt as our default T to launch Fig Step. [...] Furthermore, considering the stochastic nature of the model s replies, we repeatedly launch Fig Step five times for each question, and one jailbreak could be deemed successful if any one of five attempts could yield a prohibited response. |