Language-Assisted Feature Transformation for Anomaly Detection

Authors: EungGu Yun, Heonjin Ha, Yeongwoo Nam, Bryan Lee

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both toy and real-world datasets validate the effectiveness of our method.
Researcher Affiliation Industry Eung Gu Yun SAIGE Seoul, South Korea Heonjin Ha LG Uplus Seoul, South Korea Yeongwoo Nam Alsemy Inc. Seoul, South Korea Bryan Dongik Lee Independent Seoul, South Korea
Pseudocode Yes E ALGORITHM The following pseudocode demonstrates the implementation of LAFT AD, using a syntax similar to Num Py, as the notation used in (Radford et al., 2021).
Open Source Code Yes We provide the source code of our method at https://github.com/yuneg11/LAFT.
Open Datasets Yes Datasets To validate our approach, we used the colored version of MNIST (Le Cun et al., 2010), Waterbirds (Sagawa et al., 2019), and Celeb A (Liu et al., 2015) datasets for semantic anomaly detection (SAD). ... we also used the MVTec AD (Bergmann et al., 2019) and Vis A (Zou et al., 2022) datasets for industrial anomaly detection (IAD).
Dataset Splits Yes Dataset Split Colored MNIST R denotes red, G denotes green, and B denotes blue colored digits. 0-4 and 5-9 denote the digits from 0 to 4 and from 5 to 9, respectively. Train: R/0-4 (16.67%) Test: R/0-4 (16.67%), R/5-9 (16.67%), GB/0-4 (33.33%), GB/5-9 (33.33%) Waterbirds Wbird denotes waterbirds, and Lbird denotes landbirds. Wback denotes water background, and Lback denotes land background. Train: Wbird/Wback (22.04%) Test: Wbird/Wback (11.08%), Wbird/Lback (11.08%), Lbird/Wback (38.92%), Lbird/Lback (38.92%) Celeb A Blond denotes blond hair, and Glass denotes eyeglasses. -Blond denotes nonblond hair, and -Glass denotes no eyeglasses. Train: Blond/-Glass (14.66%) Test: Blond/Glass (13.01%), Blond/-Glass (0.31%), -Blond/Glass (80.53%), -Blond/-Glass (6.15%) MVTec AD and Vis A We use the same split as Bergmann et al. (2019) and Zou et al. (2022).
Hardware Specification Yes We use a single NVIDIA RTX 3090 GPU for all experiments.
Software Dependencies No The paper mentions specific vision-language models and their backbones (CLIP Vi T-B/16, Open CLIP, EVA-CLIP, Sig LIP, Co Ca) but does not provide version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Hyperparameter The only hyperparameter in LAFT is the number of PCA components d. We typically choose d from 4 to 32 when guiding an attribute and from 32 to 384 when ignoring an attribute. Refer to the Ablation Study for the impact of d on the performance. And we use k = 30 for the methods using k NN anomaly scoring (k NN and LAFT AD).