SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

Authors: Jaehong Yoon, Shoubin Yu, Vaidehi Ramesh Patil, Huaxiu Yao, Mohit Bansal

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, SAFREE achieves state-of-the-art performance in suppressing unsafe content in T2I generation (reducing it by 22% across 5 datasets) compared to other training-free methods and effectively filters targeted concepts, e.g., specific artist styles, while maintaining high-quality output. It also shows competitive results against training-based methods. We further extend SAFREE to various T2I backbones and T2V tasks, showcasing its flexibility and generalization.
Researcher Affiliation Academia Jaehong Yoon Shoubin Yu Vaidehi Patil Huaxiu Yao Mohit Bansal UNC Chapel Hill
Pseudocode No The paper describes the methodology in detailed text sections (e.g., Section 3.1 to 3.4) and provides a framework illustration in Figure 2. However, it does not include an explicitly labeled pseudocode or algorithm block.
Open Source Code Yes REPRODUCIBILITY STATEMENT: This paper fully discloses all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions. To maximize reproducibility, we have included our code in the supplementary material. LICENSE INFORMATION: We will make our code publicly accessible.
Open Datasets Yes We use Stable Diffusion-v1.4 (SD-v1.4) (Rombach et al., 2022) as the primary T2I backbone, following recent work (Gandikota et al., 2023; 2024; Gong et al., 2024). All methods are tested on adversarial prompts from red-teaming methods: I2P (Schramowski et al., 2023), P4D (Chin et al., 2024), Ring-a-Bell (Tsai et al., 2024), MMA-Diffusion (Yang et al., 2024a), and Unlearn Diff Zhang et al. (2023b). For generation quality, we use FID (Heusel et al., 2017), CLIP score, and TIFA (Hu et al., 2023) on COCO-30k (Lin et al., 2014), evaluating 1k samples. For quantitative evaluation, we use Safe Sora (Dai et al., 2024) with 600 toxic prompts across 12 concepts, constructing a benchmark of 296 examples across 5 categories.
Dataset Splits Yes For generation quality, we use FID (Heusel et al., 2017), CLIP score, and TIFA (Hu et al., 2023) on COCO-30k (Lin et al., 2014), evaluating 1k samples. For quantitative evaluation, we use Safe Sora (Dai et al., 2024) with 600 toxic prompts across 12 concepts, constructing a benchmark of 296 examples across 5 categories. Among these, we randomly select 1k samples for evaluating FID and TIFA.
Hardware Specification Yes All experiments are tested on a single A6000, 100 steps, and with a setting that removes the nudity concept.
Software Dependencies No The paper mentions various models and frameworks like Stable Diffusion-v1.4, SDXL, SD-v3, Zero Scope T2V, and Cog Video X, and refers to existing literature for other methods. However, it does not explicitly provide specific version numbers for ancillary software dependencies such as programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or CUDA versions used for the authors' implementation.
Experiment Setup Yes We set α = 0.01 for all experiments in this paper, demonstrating the robustness of our approach to α across T2I generation tasks with varying concepts. where γ is hyperparameter (γ = 10 throughout the paper) and cos represents cosine similarity. t denotes the self-validating threshold to determine the number of denoising steps applying to the proposed safeguard approach. All experiments are tested on a single A6000, 100 steps, and with a setting that removes the nudity concept.