DriftRemover: Hybrid Energy Optimizations for Anomaly Images Synthesis and Segmentation

Authors: Siyue Yao, Haotian Xu, Mingjie Sun, Siyue Yu, Jimin Xiao, Eng Gee Lim

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation demonstrates that our method improves pixel-level AP by 1.3% and F1-MAX by 1.8% in anomaly detection tasks on the MVTec dataset. Additionally, its successful application in practical scenarios highlights its effectiveness, improving Io U by 37.2% and F-measure by 25.1% with the Floor Dirt dataset.
Researcher Affiliation Collaboration Siyue Yao1,2 , Haotian Xu3 , Mingjie Sun4 , Siyue Yu1 , Jimin Xiao1 , Eng Gee Lim1 1Xi an Jiaotong-Liverpool University 2University of Liverpool 3Ripple Info 4Soochow University Corresponding author (EMAIL).
Pseudocode Yes Algorithm 1 Inference process of proposed Drift Remover Input: normal image xn, coarse mask m, input anomaly prompt p , total inference timestep T Parameter: conditional noise predictor ϵθ( ), image encoder ε( ), binary mask threshold η, last timestep γ for adding AAR module, first timestep δ for adding APO module, function repeat times Γ, energy value threshold Θ, pre-defined parameters αT , I, βt and σt Output: the synthetic anomaly image latent zs 0 1: zn 0 = ε(xn); zn T N(0, I) 2: zs T = concat zn T , m, ε xn(1 m) 3: for t = T to 1 do 4: i = 0 5: if t > γ then 6: Obtain the attention map with new normal embedding Ag t and new anomaly embedding As t. 7: ˆm = (Ag t > η) (As t > η) m Equation 7 8: while i < Γ and FR(As t, ˆm) < Θ do 9: zs t zs t σt zs t FR(As t, ˆm) Equation 8 i+ = 1 10: end while 11: else if t < δ then 12: while i < Γ and FO f(zr), f(zs t ) < Θ do 13: zs t zs t σt zs t FO f(zr), f(zs t ) Equation 10 i+ = 1 14: end while 15: end if 16: zs t 1 = 1 αt zs t 1 αt 1 αt ϵθ(zs t , p , t) + βt I Equation 2 17: zs t 1 zs t 1m + ( αt 1zn 0 + 1 αt 1I)(1 m) Equation 6 18: end for 19: return zs 0
Open Source Code Yes The code is available at https://github.com/JJessica Yao/Drift Remover.
Open Datasets Yes We evaluate our Drift Remover on MVTec [Bergmann et al., 2019] and Floor Dirt dataset. MVTec s original training set consists of 3,629 normal images without any anomaly, while its original test set contains 467 normal images and 1,258 anomaly images along with their corresponding mask labels for the anomaly areas.
Dataset Splits Yes MVTec s original training set consists of 3,629 normal images without any anomaly, while its original test set contains 467 normal images and 1,258 anomaly images along with their corresponding mask labels for the anomaly areas. Subsequently, followed by [Hu et al., 2023], we randomly select 1/3 of the abnormal images for training Drift Remover and the remaining images are used to test the results of the downstream tasks. The Floor Dirt dataset is collected from robotic vacuum cleaners, containing two types of anomalies: stains on the floor (500 images) and pet faeces on the floor (458 images). In our experiments, 3/5 of anomalous images are randomly selected for training our Drift Remover, and 2/5 are used for testing downstream tasks.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments. It mentions 'Our pipeline is built on Stable Diffusion V1.5' which refers to a software model.
Software Dependencies Yes Our pipeline is built on Stable Diffusion V1.5 [Rombach et al., 2022], training it for 2,000 epochs with batch size of 4 and image size of 512.
Experiment Setup Yes Our pipeline is built on Stable Diffusion V1.5 [Rombach et al., 2022], training it for 2,000 epochs with batch size of 4 and image size of 512. The optimizer Adam W utilizes a scaled learning rate initialized to 1e-4. We use 20 steps and a guidance scale of 3.5 for image generation, producing 1,000 images per class for evaluation and training. The threshold Θ and iteration cap Γ are 0.01 and 5. The last timestep γ for adding AAR module is 600, while the first timestep δ for adding APO module is 300. The binary threshold η is 180, patch size v is 3, text dimension q is 768, head number h is 8 and reduction factors k for each layer are 1, 2 and 4.