reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers

Authors: Daiheng Gao, Shilin Lu, Wenbo Zhou, Jiaming Chu, Jie Zhang, Mengxi Jia, Bang Zhang, Zhaoxin Fan, Weiming Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Erase Anything successfully fills the research gap left by earlier methods in this new T2I paradigm, achieving SOTA performance across a wide range of concept erasure tasks.
Researcher Affiliation	Collaboration	1USTC 2NTU 3BUPT 4IHPC and CFAR, A*STAR 5Tele AI 6Tongyi Lab, Alibaba 7Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University. Correspondence to: Zhaoxin Fan & Weiming Zhang <EMAIL & EMAIL>.
Pseudocode	Yes	Algorithm 1 BO formulation in Erase Anything
Open Source Code	No	Our codebase utilizes widely adopted diffusers (von Platen et al., 2022), a popular choice among developers and researchers for DMs.
Open Datasets	Yes	To assess the effectiveness and versatility of our approach, we begin by applying it to the classical task of nudity erasure. Specifically, we used our concept-erased model to generate images from a comprehensive set of 4,703 prompts extracted from the Inappropriate Image Prompt (I2P) dataset (Schramowski et al., 2023). Furthermore, to evaluate the specificity of our method in regular content, we randomly select 10,000 captions from the MS-COCO captioning dataset (validation) (Lin et al., 2014). We chose a subset from the Celeb A (Liu et al., 2018), omitting those that Flux [dev] couldn’t accurately reconstruct.
Dataset Splits	No	The paper mentions subsets and selection of data for specific evaluation tasks (e.g., 4,703 prompts from I2P, 10,000 captions from MS-COCO validation), and for an ablation study: "This resulted in a dataset of 100 celebrities, split into two groups: 50 for erasure and 50 for retention." For the celebrity recognition network, it states: "Then we randomly re-sampled the dataset and divided into training set (80%) and test set (20%)." However, it does not provide explicit training, validation, and test splits for the main model being developed, which is a finetuned Flux model using LoRA modules.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments. It only refers to 'Flux.1 [dev] model' and 'diffusers'.
Software Dependencies	No	The paper mentions several software components and libraries (e.g., 'diffusers', 'NLTK', 'GPT-4o', 'Adam W optimizer'), but it does not provide specific version numbers for any of them. For instance, 'diffusers (von Platen et al., 2022)' mentions a publication year, not a software version.
Experiment Setup	Yes	Unless otherwise specified, our experiments employ the flow-matching Euler sampler with 28 steps and Adam W (Loshchilov et al., 2017) optimizer for 1,000 steps, with a learning rate αlow = 0.001, αup = 0.0005 and an erasing guidance factor η = 1 under all conditions. In terms of concept construction, we harness the power of NLTK (Bird et al., 2009) to generate synonym concepts, and GPT-4o in the extraction of irrelevant concepts. Our fine-tuning process focuses on the text-related parameters add q proj and add k proj (subsets of Q and K) within the dual stream blocks. Based on empirical testing, we have determined that setting τ = 0.07 is optimal for our model’s performance. We train the celebrity recognition network on top of Mobile Net V2 that pretrained on Image Net, then add a Global Average Pooling2D and Softmax(Dense) at the end of the orginal output (out relu) of Mobile Net V2. The learning rate is a fixed 1e-4 with Adam optimizer and loss function is categorical cross-entropy.