reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

Authors: Jaehong Yoon, Shoubin Yu, Vaidehi Ramesh Patil, Huaxiu Yao, Mohit Bansal

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, SAFREE achieves state-of-the-art performance in suppressing unsafe content in T2I generation (reducing it by 22% across 5 datasets) compared to other training-free methods and effectively filters targeted concepts, e.g., specific artist styles, while maintaining high-quality output. It also shows competitive results against training-based methods. We further extend SAFREE to various T2I backbones and T2V tasks, showcasing its flexibility and generalization.
Researcher Affiliation	Academia	Jaehong Yoon Shoubin Yu Vaidehi Patil Huaxiu Yao Mohit Bansal UNC Chapel Hill
Pseudocode	No	The paper describes the methodology in detailed text sections (e.g., Section 3.1 to 3.4) and provides a framework illustration in Figure 2. However, it does not include an explicitly labeled pseudocode or algorithm block.
Open Source Code	Yes	REPRODUCIBILITY STATEMENT: This paper fully discloses all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions. To maximize reproducibility, we have included our code in the supplementary material. LICENSE INFORMATION: We will make our code publicly accessible.
Open Datasets	Yes	We use Stable Diffusion-v1.4 (SD-v1.4) (Rombach et al., 2022) as the primary T2I backbone, following recent work (Gandikota et al., 2023; 2024; Gong et al., 2024). All methods are tested on adversarial prompts from red-teaming methods: I2P (Schramowski et al., 2023), P4D (Chin et al., 2024), Ring-a-Bell (Tsai et al., 2024), MMA-Diffusion (Yang et al., 2024a), and Unlearn Diff Zhang et al. (2023b). For generation quality, we use FID (Heusel et al., 2017), CLIP score, and TIFA (Hu et al., 2023) on COCO-30k (Lin et al., 2014), evaluating 1k samples. For quantitative evaluation, we use Safe Sora (Dai et al., 2024) with 600 toxic prompts across 12 concepts, constructing a benchmark of 296 examples across 5 categories.
Dataset Splits	Yes	For generation quality, we use FID (Heusel et al., 2017), CLIP score, and TIFA (Hu et al., 2023) on COCO-30k (Lin et al., 2014), evaluating 1k samples. For quantitative evaluation, we use Safe Sora (Dai et al., 2024) with 600 toxic prompts across 12 concepts, constructing a benchmark of 296 examples across 5 categories. Among these, we randomly select 1k samples for evaluating FID and TIFA.
Hardware Specification	Yes	All experiments are tested on a single A6000, 100 steps, and with a setting that removes the nudity concept.
Software Dependencies	No	The paper mentions various models and frameworks like Stable Diffusion-v1.4, SDXL, SD-v3, Zero Scope T2V, and Cog Video X, and refers to existing literature for other methods. However, it does not explicitly provide specific version numbers for ancillary software dependencies such as programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or CUDA versions used for the authors' implementation.
Experiment Setup	Yes	We set α = 0.01 for all experiments in this paper, demonstrating the robustness of our approach to α across T2I generation tasks with varying concepts. where γ is hyperparameter (γ = 10 throughout the paper) and cos represents cosine similarity. t denotes the self-validating threshold to determine the number of denoising steps applying to the proposed safeguard approach. All experiments are tested on a single A6000, 100 steps, and with a setting that removes the nudity concept.