Towards VLM-based Hybrid Explainable Prompt Enhancement for Zero-Shot Industrial Anomaly Detection
Authors: Weichao Cai, Weiliang Huang, Yunkang Cao, Chao Huang, Fei Yuan, Bob Zhang, Jie Wen
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on seven realworld industrial anomaly detection datasets have shown that the proposed method not only outperforms recent SOTA methods, but also its explainable prompts provide the model with a more intuitive basis for anomaly identification. |
| Researcher Affiliation | Academia | Weichao Cai1 , Weiliang Huang2 , Yunkang Cao3 , Chao Huang4 , Fei Yuan1 , Bob Zhang2 , Jie Wen5 1School of Information, Xiamen University 2Department of Computer and Information Science, University of Macau 3School of Robotics, Hunan University 4School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University 5School of Computer Science & Technology, Harbin Institute of Technology, Shenzhen EMAIL, EMAIL, EMAIL EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using figures and textual descriptions (e.g., Figures 1, 2, 3 provide overviews and detailed components), but it does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an unambiguous statement of code release or a link to a source-code repository for the methodology described. |
| Open Datasets | Yes | We conduct experiments using the seven industrial anomaly detection datasets for all experiments: MVTec AD [Bergmann et al., 2021], Vis A [Zou et al., 2022], MPDD [Jezek et al., 2021], BTAD [Mishra et al., 2021], KSDD [Tabernik et al., 2020], DAGM [Wieler and Hahn, 2007], and DTD-Synthetic [Aota et al., 2023]. |
| Dataset Splits | No | The paper lists several datasets used for experiments, but it does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | Yes | All experiments were performed with a single NVIDIA A100 GPU (80GB). |
| Software Dependencies | No | We adopted QWen2-VL-72B [Wang et al., 2024b] to generate detailed descriptions of the anomalies. Furthermore, QWen2.5-7B [Yang et al., 2024] is utilized to extract anomalous information and judge the presence of anomalies. The pre-trained CLIP (Vi T-L/14@336px) [Radford et al., 2021] is employed as the backbone for subsequent ZSIAD models, extracting patch embeddings from the 6th, 12th, 18th, and 24th Vi T blocks. DINOv2 (Vi T-S) [Oquab et al., 2024] is adopted as the VFM. While specific models/architectures are named with their respective publications, the paper does not list specific software libraries (e.g., PyTorch, TensorFlow) with version numbers. |
| Experiment Setup | Yes | We trained the proposed method for 5 epochs with a learning rate of 0.01. |