DZAD: Diffusion-based Zero-shot Anomaly Detection
Authors: Tianrui Zhang, Liang Gao, Xinyu Li, Yiping Gao
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comparing with 7 recently state-of-the-art (SOTA) methods on MVTec AD and Vis A datasets and analysis of the role of each component in ablation studies. The experiments demonstrate the validity of the method beyond the existing methods. |
| Researcher Affiliation | Academia | Tianrui Zhang1, 2, Liang Gao1, Xinyu Li1*, Yiping Gao1 1Huazhong University of Science and Technology 2National University of Singapore EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using natural language and mathematical equations, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about code availability, nor does it provide a link to a code repository. |
| Open Datasets | Yes | MVTec AD dataset. The MVTec AD (Bergmann et al. 2019) dataset simulates real-world industrial production scenarios. Vis A dataset. The Vis A (Zou et al. 2022) dataset consists of 12 subsets corresponding to 12 different objects. BTAD dataset. The BTAD dataset (Mishra et al. 2021) specifically targets anomaly detection tasks within industrial texture quality control. |
| Dataset Splits | Yes | MVTec AD dataset. The training set contains 3629 images with only normal samples and test set consists of 1725 images. Vis A dataset. It includes 10,821 images, with 9,621 normal and 1,200 anomalous samples. |
| Hardware Specification | Yes | The training is conducted 1100 epochs on a single NVIDIA A100 40GB GPU, with a batch size of 32. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and DDIM sampler but does not provide specific version numbers for any software libraries, programming languages, or environments. |
| Experiment Setup | Yes | In this experiment, all images are resized to 256 256. We employ Res Net50 as the feature extraction network and select n {2, 3, 4} as the feature layers for calculating anomaly localization. The training is conducted 1100 epochs on a single NVIDIA A100 40GB GPU, with a batch size of 32. We use the Adam optimizer (Loshchilov and Hutter 2019) with a learning rate of 1 e 5. For anomaly detection, the anomaly score of the image is derived from the maximum value of the anomaly localization score, which is processed through eight rounds of global average pooling, each with a size of 8 8. The initial denoising timestep T is set to 1,400 during inference. We employ DDIM (Song, Meng, and Ermon 2021) as the default sampler with 10 steps. |