Language-Assisted Feature Transformation for Anomaly Detection
Authors: EungGu Yun, Heonjin Ha, Yeongwoo Nam, Bryan Lee
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both toy and real-world datasets validate the effectiveness of our method. |
| Researcher Affiliation | Industry | Eung Gu Yun SAIGE Seoul, South Korea Heonjin Ha LG Uplus Seoul, South Korea Yeongwoo Nam Alsemy Inc. Seoul, South Korea Bryan Dongik Lee Independent Seoul, South Korea |
| Pseudocode | Yes | E ALGORITHM The following pseudocode demonstrates the implementation of LAFT AD, using a syntax similar to Num Py, as the notation used in (Radford et al., 2021). |
| Open Source Code | Yes | We provide the source code of our method at https://github.com/yuneg11/LAFT. |
| Open Datasets | Yes | Datasets To validate our approach, we used the colored version of MNIST (Le Cun et al., 2010), Waterbirds (Sagawa et al., 2019), and Celeb A (Liu et al., 2015) datasets for semantic anomaly detection (SAD). ... we also used the MVTec AD (Bergmann et al., 2019) and Vis A (Zou et al., 2022) datasets for industrial anomaly detection (IAD). |
| Dataset Splits | Yes | Dataset Split Colored MNIST R denotes red, G denotes green, and B denotes blue colored digits. 0-4 and 5-9 denote the digits from 0 to 4 and from 5 to 9, respectively. Train: R/0-4 (16.67%) Test: R/0-4 (16.67%), R/5-9 (16.67%), GB/0-4 (33.33%), GB/5-9 (33.33%) Waterbirds Wbird denotes waterbirds, and Lbird denotes landbirds. Wback denotes water background, and Lback denotes land background. Train: Wbird/Wback (22.04%) Test: Wbird/Wback (11.08%), Wbird/Lback (11.08%), Lbird/Wback (38.92%), Lbird/Lback (38.92%) Celeb A Blond denotes blond hair, and Glass denotes eyeglasses. -Blond denotes nonblond hair, and -Glass denotes no eyeglasses. Train: Blond/-Glass (14.66%) Test: Blond/Glass (13.01%), Blond/-Glass (0.31%), -Blond/Glass (80.53%), -Blond/-Glass (6.15%) MVTec AD and Vis A We use the same split as Bergmann et al. (2019) and Zou et al. (2022). |
| Hardware Specification | Yes | We use a single NVIDIA RTX 3090 GPU for all experiments. |
| Software Dependencies | No | The paper mentions specific vision-language models and their backbones (CLIP Vi T-B/16, Open CLIP, EVA-CLIP, Sig LIP, Co Ca) but does not provide version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Hyperparameter The only hyperparameter in LAFT is the number of PCA components d. We typically choose d from 4 to 32 when guiding an attribute and from 32 to 384 when ignoring an attribute. Refer to the Ablation Study for the impact of d on the performance. And we use k = 30 for the methods using k NN anomaly scoring (k NN and LAFT AD). |