LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction

Authors: Er Jin, Qihui Feng, Yongli Mou, Gerhard Lakemeyer, Stefan Decker, Oliver Simons, Johannes Stegmaier

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments to evaluate the effectiveness of our algorithm using three SOTA AVLMs: GPT-4o (Achiam et al. 2023), LLa VA 1.6 (Liu et al. 2024b), LLa VA 1.5 (Liu et al. 2024a). Detailed information on the deployment and versions of AVLMs is provided in Appendix A.4. All experiments are training-free. Our evaluations are performed on two datasets, MVTec AD and MVTec LOCO AD (Bergmann et al. 2019, 2022). We perform one-shot experiments using a single training image and compare our results with competing methods, including full-shot approaches trained on all available images.
Researcher Affiliation Academia 1Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany 2Department of Computer Science, RWTH Aachen University, Aachen, Germany 3 Fraunhofer Institute for Applied Information Technology FIT, Sankt Augustin, Germany 4 Independent Researcher EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper includes 'Figure 2: Pipeline overview of Logic AD' which is a block diagram describing the system components. The 'Logic Reasoner' section describes the logical formalisms and the use of 'Prover9 for theorem proving', but these are presented as descriptive text and logical formulae, not in a structured pseudocode or algorithm block format.
Open Source Code Yes The dataset, code and supplementary materials https://jasonjin34.github.io/logicad.github.io/
Open Datasets Yes The dataset, code and supplementary materials https://jasonjin34.github.io/logicad.github.io/Our evaluations are performed on two datasets, MVTec AD and MVTec LOCO AD (Bergmann et al. 2019, 2022).
Dataset Splits Yes We perform one-shot experiments using a single training image and compare our results with competing methods, including full-shot approaches trained on all available images.
Hardware Specification No Detailed information on the deployment and versions of AVLMs is provided in Appendix A.4. The main body of the paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies Yes We conduct comprehensive experiments to evaluate the effectiveness of our algorithm using three SOTA AVLMs: GPT-4o (Achiam et al. 2023), LLa VA 1.6 (Liu et al. 2024b), LLa VA 1.5 (Liu et al. 2024a). ... We use Prover9 for theorem proving. ... using Grounding DINO as our ROI extraction model (Liu et al. 2023a). ... text-embedding-3-large from Open AI (Achiam et al. 2023)
Experiment Setup Yes (1) Generating a set of regions wi, where i [1, N] and N represents the total number of ROIs, using the function f GDINO with feature prompts. Each region, along with the original image, is then processed by the function f AV LM, K = 3 times, yielding a collection of textual descriptions T = {t1, t2, t3}. (2) Constructing the text embedding space M by applying the text embedding model, text-embedding-3-large from Open AI (Achiam et al. 2023), femb to T , resulting in M = femb(T ) = {ei}k i=1, where ei is the embedding of the extracted text. Subsequently, these text embeddings are fed into an outlier detection model, specifically the Local Outlier Factor (LOF) function f LOF , to generate the filtered text embedding space, which is denoted as Tfilter. We then randomly select corresponding text from the filtered embedding space Tfilter.