SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Authors: Jaehong Yoon, Shoubin Yu, Vaidehi Ramesh Patil, Huaxiu Yao, Mohit Bansal
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, SAFREE achieves state-of-the-art performance in suppressing unsafe content in T2I generation (reducing it by 22% across 5 datasets) compared to other training-free methods and effectively filters targeted concepts, e.g., specific artist styles, while maintaining high-quality output. It also shows competitive results against training-based methods. We further extend SAFREE to various T2I backbones and T2V tasks, showcasing its flexibility and generalization. |
| Researcher Affiliation | Academia | Jaehong Yoon Shoubin Yu Vaidehi Patil Huaxiu Yao Mohit Bansal UNC Chapel Hill |
| Pseudocode | No | The paper describes the methodology in detailed text sections (e.g., Section 3.1 to 3.4) and provides a framework illustration in Figure 2. However, it does not include an explicitly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | REPRODUCIBILITY STATEMENT: This paper fully discloses all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions. To maximize reproducibility, we have included our code in the supplementary material. LICENSE INFORMATION: We will make our code publicly accessible. |
| Open Datasets | Yes | We use Stable Diffusion-v1.4 (SD-v1.4) (Rombach et al., 2022) as the primary T2I backbone, following recent work (Gandikota et al., 2023; 2024; Gong et al., 2024). All methods are tested on adversarial prompts from red-teaming methods: I2P (Schramowski et al., 2023), P4D (Chin et al., 2024), Ring-a-Bell (Tsai et al., 2024), MMA-Diffusion (Yang et al., 2024a), and Unlearn Diff Zhang et al. (2023b). For generation quality, we use FID (Heusel et al., 2017), CLIP score, and TIFA (Hu et al., 2023) on COCO-30k (Lin et al., 2014), evaluating 1k samples. For quantitative evaluation, we use Safe Sora (Dai et al., 2024) with 600 toxic prompts across 12 concepts, constructing a benchmark of 296 examples across 5 categories. |
| Dataset Splits | Yes | For generation quality, we use FID (Heusel et al., 2017), CLIP score, and TIFA (Hu et al., 2023) on COCO-30k (Lin et al., 2014), evaluating 1k samples. For quantitative evaluation, we use Safe Sora (Dai et al., 2024) with 600 toxic prompts across 12 concepts, constructing a benchmark of 296 examples across 5 categories. Among these, we randomly select 1k samples for evaluating FID and TIFA. |
| Hardware Specification | Yes | All experiments are tested on a single A6000, 100 steps, and with a setting that removes the nudity concept. |
| Software Dependencies | No | The paper mentions various models and frameworks like Stable Diffusion-v1.4, SDXL, SD-v3, Zero Scope T2V, and Cog Video X, and refers to existing literature for other methods. However, it does not explicitly provide specific version numbers for ancillary software dependencies such as programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or CUDA versions used for the authors' implementation. |
| Experiment Setup | Yes | We set α = 0.01 for all experiments in this paper, demonstrating the robustness of our approach to α across T2I generation tasks with varying concepts. where γ is hyperparameter (γ = 10 throughout the paper) and cos represents cosine similarity. t denotes the self-validating threshold to determine the number of denoising steps applying to the proposed safeguard approach. All experiments are tested on a single A6000, 100 steps, and with a setting that removes the nudity concept. |