Minimalist Concept Erasure in Generative Models

Authors: Yang Zhang, Er Jin, Yanfei Dong, Yixuan Wu, Philip Torr, Ashkan Khakzar, Johannes Stegmaier, Kenji Kawaguchi

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on state-of-the-art flow-matching models demonstrate that our method robustly erases concepts without degrading overall model performance, paving the way for safer and more responsible generative models. [...] Our experimental validation confirms the robustness of our method in successfully removing the target concept, even under adversarial attacks. We shows that our method surpasses baselines in erasure effectiveness, robustness against adversarial attacks, and preserving model performance.
Researcher Affiliation Collaboration 1National University of Singapore 2RWTH Aachen University 3Pay Pal Inc. 4Zhejiang University 5University of Oxford.
Pseudocode No The paper does not contain any sections, figures, or blocks explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code No The paper does not contain any explicit statements about releasing their code or providing a link to a code repository for their methodology. While it references publicly available datasets, it does not offer its own source code.
Open Datasets Yes Evaluation data: We consider four concerning topics: nudity, inappropriate objects (gun, knife, drug), IP characters (Hulk, Superman, Wolverine, Captain America, Batman), and art styles (Van Gogh, Picasso, Dali, Cubism, and Monet). For each topic, we collect a set of concepts to erase. Details of the concepts included in each topic can be found in Appendix G. ... FID on five thousand LAION prompts for image quality (Schuhmann et al., 2021). Appendix G.1. Adversarial Attacks: Ring-A-Bell (Tsai et al., 2024): ... sourced from Hugging Face1. 1https://huggingface.co/datasets/Chia15/Ring ABell-Nudity. MMA-Diffusion (Yang et al., 2024b): ... publicly available version2. 2https://huggingface.co/datasets/Yijun Yang280/MMA-Diffusion-NSFW-adv-prompts-benchmark?not-for-all-audiences=true. Prompt4Debugging (P4D) (Chin et al., 2024): ... utilizes this dataset directly from Huggingface3. 3https://huggingface.co/datasets/joycenerd/p4d. Inappropriate Image Prompt(I2P) (Schramowski et al., 2023). H. Detailed Experiment Settings: ... we exclusively use the prompts from the GCC3M dataset as neutral concept prompts (Sharma et al., 2018).
Dataset Splits Yes Test prompts are not used for training. ... FID on five thousand LAION prompts for image quality (Schuhmann et al., 2021). ... Ablating dataset scale. Table 5 shows how the size of the unlearning dataset affects our final performance. ... with 20 data samples, we can effectively erase the target concept.
Hardware Specification Yes Table 6: Training Configuration for Unlearning. Hardware used 1 NVIDIA H100
Software Dependencies No The paper mentions "Optimizer Adam" and "Scheduler constant" in Table 6, but these are generic algorithms or types rather than specific software packages with version numbers. It also refers to "Nude Netv2" and "Nude Netv3.4" for evaluation, which have version numbers, but these are evaluation tools rather than core software dependencies for the methodology. Specific versions for programming languages (e.g., Python), frameworks (e.g., PyTorch), or other key libraries are not provided.
Experiment Setup Yes H.1. Default Training Config: Table 6: Training Configuration for Unlearning. Parameter Value: Batch size 4, lrffn 0.5, lrnorm 0.5, β 0.01, Optimizer Adam, Training Steps 400, Weight decay 1 10 2, Scheduler constant, Diffusion pretrained weight FLUX.1-schnell.