Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly Detection

Authors: Weichao Cai, Weiliang Huang, Yunkang Cao, Chao Huang, Fei Yuan, Bob Zhang, Jie Wen

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on seven realworld industrial anomaly detection datasets have shown that the proposed method not only outperforms recent SOTA methods, but also its explainable prompts provide the model with a more intuitive basis for anomaly identification.
Researcher Affiliation Academia Weichao Cai1 , Weiliang Huang2 , Yunkang Cao3 , Chao Huang4 , Fei Yuan1 , Bob Zhang2 , Jie Wen5 1School of Information, Xiamen University 2Department of Computer and Information Science, University of Macau 3School of Robotics, Hunan University 4School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University 5School of Computer Science & Technology, Harbin Institute of Technology, Shenzhen EMAIL, EMAIL, EMAIL EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using figures and textual descriptions (e.g., Figures 1, 2, 3 provide overviews and detailed components), but it does not contain explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an unambiguous statement of code release or a link to a source-code repository for the methodology described.
Open Datasets Yes We conduct experiments using the seven industrial anomaly detection datasets for all experiments: MVTec AD [Bergmann et al., 2021], Vis A [Zou et al., 2022], MPDD [Jezek et al., 2021], BTAD [Mishra et al., 2021], KSDD [Tabernik et al., 2020], DAGM [Wieler and Hahn, 2007], and DTD-Synthetic [Aota et al., 2023].
Dataset Splits No The paper lists several datasets used for experiments, but it does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification Yes All experiments were performed with a single NVIDIA A100 GPU (80GB).
Software Dependencies No We adopted QWen2-VL-72B [Wang et al., 2024b] to generate detailed descriptions of the anomalies. Furthermore, QWen2.5-7B [Yang et al., 2024] is utilized to extract anomalous information and judge the presence of anomalies. The pre-trained CLIP (Vi T-L/14@336px) [Radford et al., 2021] is employed as the backbone for subsequent ZSIAD models, extracting patch embeddings from the 6th, 12th, 18th, and 24th Vi T blocks. DINOv2 (Vi T-S) [Oquab et al., 2024] is adopted as the VFM. While specific models/architectures are named with their respective publications, the paper does not list specific software libraries (e.g., PyTorch, TensorFlow) with version numbers.
Experiment Setup Yes We trained the proposed method for 5 epochs with a learning rate of 0.01.