Aligning and Prompting Anything for Zero-Shot Generalized Anomaly Detection

Authors: Jitao Ma, Weiying Xie, Hangyu Ye, Daixun Li, Leyuan Fang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 13 real-world anomaly detection datasets demonstrate that TPS achieves superior ZGAD performance across highly diverse datasets from industrial and medical domains. Code https://github.com/majitao-xd/TPS
Researcher Affiliation Academia 1 State Key Laboratory of Integrated Services Networks, Xidian University, Xi an 710071, China 2 College of Electrical and Information Engineering, Hunan University, Changsha 410082, China EMAIL, EMAIL, EMAIL, EMAIL, leyuan EMAIL
Pseudocode No The paper describes the methodology using prose and mathematical equations in the 'Method' section, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured, code-like steps for its procedures.
Open Source Code Yes Code https://github.com/majitao-xd/TPS
Open Datasets Yes To study anomaly classification and segmentation performance, we conduct experiments on 13 publicly available datasets, covering various industrial inspection scenarios and medical imaging domains (including photography, endoscopy, and radiology) to evaluate the performance of TPS. In industrial inspection, we consider MVTec AD (Bergmann et al. 2019), Vis A (Zou et al. 2022), MPDD (Jezek et al. 2021), BTAD (Mishra et al. 2021), SD-saliency900 (Song, Song, and Yan 2020) and RSDDS-113 (Niu et al. 2020). In medical imaging, we consider brain tumor detection datasets Head CT (Salehi et al. 2021), Brain MRI (Salehi et al. 2021), Br35H (Ahmed 2020), skin cancer detection dataset ISIC (Codella et al. 2018), colon polyp detection dataset Kvasir (Jha et al. 2020), thyroid nodule detection dataset TN3k (Gong et al. 2021) and lung cancer segmentation dataset MSD (Antonelli et al. 2022).
Dataset Splits Yes We fine-tune TPS using the test set of MVTec AD (Bergmann et al. 2019) and evaluate the ZGAD performance on other datasets. Only model tested on MVTec AD is trained with the test set of Vsi A (Zou et al. 2022) as an auxiliary dataset. All experiments are performed in Py Torch-1.11.0 with a single NVIDIA RTX 3090 24GB GPU. For datasets that do not provide public test set labels, we validate model performance using the training set.
Hardware Specification Yes All experiments are performed in Py Torch-1.11.0 with a single NVIDIA RTX 3090 24GB GPU.
Software Dependencies Yes All experiments are performed in Py Torch-1.11.0 with a single NVIDIA RTX 3090 24GB GPU.
Experiment Setup Yes Implementation details In this paper, we use the publicly available CLIP model (Vi T-L/14-336) as our backbone, the code of CLIP for LAION-400M (Schuhmann et al. 2021) and LAION-5B (Schuhmann et al. 2022) scale pre-training is open-scoured by Open CLIP (Ilharco et al. 2021). It contains 24 layers, which are divided into 4 stages. Each stage is composed of 6 layers. We extract the output of the patch token for each stage as zi x, thus N = 4, K = [6, 12, 18, 24]. We extract the last layer of linear projected class tokens for classification optimization. All test images are scaled to 518 518 and fed into backbone. In the training phase, the CLIP weights are frozen and the pathway module is trainable. Specifically, the output from CLIP includes global textual embeddings tg, shunt textual embeddings ts, visual embeddings vx, and patch token embeddings {zi x}i K. ... our loss function is expressed as: L = CE(Cx, yc)+Focal(Up(Sx), ys)+Dice(Up(Sx), ys)