AnoLLM: Large Language Models for Tabular Anomaly Detection
Authors: Che-Ping Tsai, Ganyu Teng, Phillip Wallis, Wei Ding
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results indicate that Ano LLM delivers the best performance on six benchmark datasets with mixed feature types. Additionally, across 30 datasets from the ODDS library, which are predominantly numerical, Ano LLM performs on par with top performing baselines. |
| Researcher Affiliation | Industry | Che-Ping Tsai , Ganyu Teng, Phil Wallis, Wei Ding Amazon EMAIL |
| Pseudocode | No | The paper describes the methods through textual explanations and mathematical equations (e.g., Eqn. 1, Eqn. 5, Eqn. 6) and process descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an unambiguous statement of releasing its own source code nor provides a direct link to a code repository for the Ano LLM framework. It mentions using Py OD library and Deep OD library for baselines, but not for their proposed method. |
| Open Datasets | Yes | Datasets: Since popular anomaly detection benchmarks, such as ADBench (Han et al., 2022) and the ODDS library (Rayana, 2016), mainly consist of numerical features, we manually collect six datasets that contain mixed types of features. The six datasets are derived from ODDS library (Rayana, 2016), the fraud dataset benchmarks (Grover et al., 2022) and Kaggle. The dataset statistics are described in Table 1. To demonstrate the ability of Ano LLM to accommodate numerical columns, we also evaluate the approach on 30 datasets from the ODDS library, which are mainly composed of numerical features. The ODDS library is collected from various domains, such as chemistry, healthcare, and astronautics. |
| Dataset Splits | Yes | Evaluation protocols: Following prior works (Shenkar & Wolf, 2022; Xu et al., 2023b), we conduct experiments in an uncontaminated, unsupervised setting. The training set consists of a random sample of 50% from the pool of normal examples, with the test set comprising the remaining normal examples, along with all anomalies. We randomly split each dataset using 5 different random seeds and reported the averaged results. |
| Hardware Specification | Yes | Finetuning and inference are performed on seven Nvidia A100 40GB GPUs hosted on Amazon EC2 P4 Instances. [...] The total compute required to train Ano LLM-135M across all datasets with five seeds, including six datasets from the mixed-type benchmark and 30 datasets from the ODDS benchmark, is approximately 90 GPU hours on a single RTX-A6000 GPU with 48 GB of memory. |
| Software Dependencies | No | The paper mentions using an Adam W optimizer (Loshchilov & Hutter, 2019) and Py OD library (Zhao et al., 2019) and Deep OD library (Xu et al., 2023a) for baselines, and Lo RA adapter (Hu et al., 2022). However, specific version numbers for these software components or other key libraries (like PyTorch, TensorFlow, or Python) are not provided. |
| Experiment Setup | Yes | Fine-tuning is conducted for 2,000 steps with an Adam W optimizer (Loshchilov & Hutter, 2019) with learning rate 5 × 10−5 across all datasets as the training loss converges uniformly. Batch sizes are adjusted for each dataset to accommodate the varying lengths of serialized data. During inference, we select the number of permutations r = 21 since further increasing r does not result in any observed improvement. [...] Detailed hyperparameters are shown in Table 7 of the Appendix. |