reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Perception-Guided Jailbreak Against Text-to-Image Models

Authors: Yihao Huang, Le Liang, Tianlin Li, Xiaojun Jia, Run Wang, Weikai Miao, Geguang Pu, Yang Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments conducted on six open-source models and commercial online services with thousands of prompts have verified the effectiveness of PGJ.
Researcher Affiliation	Collaboration	1Nanyang Technological University, Singapore 2East China Normal University, China 3Wuhan University, China 4Key Laboratory of Cyberspace Security, Ministry of Education, China 5Shanghai Trusted Industrial Control Platform Co.,Ltd., China
Pseudocode	No	The paper describes the steps for "Unsafe word selection" and "Word substitution" using LLM instructions in prose with examples, but does not present a formally labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not contain an explicit statement offering access to the source code for the proposed PGJ method, nor does it provide a link to a code repository.
Open Datasets	No	Thus we exploit GPT4 to generate a dataset with 1,000 prompts for five classical NSFW types: discrimination, illegal, pornographic, privacy, and violent. The paper does not provide a link, DOI, or specific repository name for accessing this generated dataset.
Dataset Splits	Yes	We select 20 prompts for each NSFW type, a total of 100 prompts. ... Each NSFW type is represented by 200 prompts.
Hardware Specification	Yes	All the experiments were run on an Ubuntu system with an NVIDIA A6000 Tensor Core GPU of 48G RAM.
Software Dependencies	No	The paper does not mention any specific software dependencies or libraries with their version numbers.
Experiment Setup	Yes	Victim T2I Models. We adopt six popular T2I models as the victims of our attack. They are DALL E 2 (Open AI 2021), DALL E 3 (Open AI 2023a), Cogview3 (Zhipu 2024), SDXL (Podell et al. 2023), Tongyiwanxiang (Ali 2023b), and Hunyuan (Tencent 2024). ... Datasets. ... We exploit GPT4 to generate a dataset with 1,000 prompts for five classical NSFW types: discrimination, illegal, pornographic, privacy, and violent. ... Baselines. ... Evaluation metrics. We use four metrics to evaluate the experiment. ❶We use the attack success rate (ASR) metric... ❷We use the semantic consistency (SC) metric... ❸We use prompt perplexity (PPL) as a metric... ❹We use the Inception Score (IS)...