DZAD: Diffusion-based Zero-shot Anomaly Detection

Authors: Tianrui Zhang, Liang Gao, Xinyu Li, Yiping Gao

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comparing with 7 recently state-of-the-art (SOTA) methods on MVTec AD and Vis A datasets and analysis of the role of each component in ablation studies. The experiments demonstrate the validity of the method beyond the existing methods.
Researcher Affiliation Academia Tianrui Zhang1, 2, Liang Gao1, Xinyu Li1*, Yiping Gao1 1Huazhong University of Science and Technology 2National University of Singapore EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using natural language and mathematical equations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about code availability, nor does it provide a link to a code repository.
Open Datasets Yes MVTec AD dataset. The MVTec AD (Bergmann et al. 2019) dataset simulates real-world industrial production scenarios. Vis A dataset. The Vis A (Zou et al. 2022) dataset consists of 12 subsets corresponding to 12 different objects. BTAD dataset. The BTAD dataset (Mishra et al. 2021) specifically targets anomaly detection tasks within industrial texture quality control.
Dataset Splits Yes MVTec AD dataset. The training set contains 3629 images with only normal samples and test set consists of 1725 images. Vis A dataset. It includes 10,821 images, with 9,621 normal and 1,200 anomalous samples.
Hardware Specification Yes The training is conducted 1100 epochs on a single NVIDIA A100 40GB GPU, with a batch size of 32.
Software Dependencies No The paper mentions using the Adam optimizer and DDIM sampler but does not provide specific version numbers for any software libraries, programming languages, or environments.
Experiment Setup Yes In this experiment, all images are resized to 256 256. We employ Res Net50 as the feature extraction network and select n {2, 3, 4} as the feature layers for calculating anomaly localization. The training is conducted 1100 epochs on a single NVIDIA A100 40GB GPU, with a batch size of 32. We use the Adam optimizer (Loshchilov and Hutter 2019) with a learning rate of 1 e 5. For anomaly detection, the anomaly score of the image is derived from the maximum value of the anomaly localization score, which is processed through eight rounds of global average pooling, each with a size of 8 8. The initial denoising timestep T is set to 1,400 during inference. We employ DDIM (Song, Meng, and Ermon 2021) as the default sampler with 10 steps.