reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Minimalist Concept Erasure in Generative Models

Authors: Yang Zhang, Er Jin, Yanfei Dong, Yixuan Wu, Philip Torr, Ashkan Khakzar, Johannes Stegmaier, Kenji Kawaguchi

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on state-of-the-art flow-matching models demonstrate that our method robustly erases concepts without degrading overall model performance, paving the way for safer and more responsible generative models. [...] Our experimental validation confirms the robustness of our method in successfully removing the target concept, even under adversarial attacks. We shows that our method surpasses baselines in erasure effectiveness, robustness against adversarial attacks, and preserving model performance.
Researcher Affiliation	Collaboration	1National University of Singapore 2RWTH Aachen University 3Pay Pal Inc. 4Zhejiang University 5University of Oxford.
Pseudocode	No	The paper does not contain any sections, figures, or blocks explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code	No	The paper does not contain any explicit statements about releasing their code or providing a link to a code repository for their methodology. While it references publicly available datasets, it does not offer its own source code.
Open Datasets	Yes	Evaluation data: We consider four concerning topics: nudity, inappropriate objects (gun, knife, drug), IP characters (Hulk, Superman, Wolverine, Captain America, Batman), and art styles (Van Gogh, Picasso, Dali, Cubism, and Monet). For each topic, we collect a set of concepts to erase. Details of the concepts included in each topic can be found in Appendix G. ... FID on five thousand LAION prompts for image quality (Schuhmann et al., 2021). Appendix G.1. Adversarial Attacks: Ring-A-Bell (Tsai et al., 2024): ... sourced from Hugging Face1. 1https://huggingface.co/datasets/Chia15/Ring ABell-Nudity. MMA-Diffusion (Yang et al., 2024b): ... publicly available version2. 2https://huggingface.co/datasets/Yijun Yang280/MMA-Diffusion-NSFW-adv-prompts-benchmark?not-for-all-audiences=true. Prompt4Debugging (P4D) (Chin et al., 2024): ... utilizes this dataset directly from Huggingface3. 3https://huggingface.co/datasets/joycenerd/p4d. Inappropriate Image Prompt(I2P) (Schramowski et al., 2023). H. Detailed Experiment Settings: ... we exclusively use the prompts from the GCC3M dataset as neutral concept prompts (Sharma et al., 2018).
Dataset Splits	Yes	Test prompts are not used for training. ... FID on five thousand LAION prompts for image quality (Schuhmann et al., 2021). ... Ablating dataset scale. Table 5 shows how the size of the unlearning dataset affects our final performance. ... with 20 data samples, we can effectively erase the target concept.
Hardware Specification	Yes	Table 6: Training Configuration for Unlearning. Hardware used 1 NVIDIA H100
Software Dependencies	No	The paper mentions "Optimizer Adam" and "Scheduler constant" in Table 6, but these are generic algorithms or types rather than specific software packages with version numbers. It also refers to "Nude Netv2" and "Nude Netv3.4" for evaluation, which have version numbers, but these are evaluation tools rather than core software dependencies for the methodology. Specific versions for programming languages (e.g., Python), frameworks (e.g., PyTorch), or other key libraries are not provided.
Experiment Setup	Yes	H.1. Default Training Config: Table 6: Training Configuration for Unlearning. Parameter Value: Batch size 4, lrffn 0.5, lrnorm 0.5, β 0.01, Optimizer Adam, Training Steps 400, Weight decay 1 10 2, Scheduler constant, Diffusion pretrained weight FLUX.1-schnell.