Disentangling Tabular Data Towards Better One-Class Anomaly Detection
Authors: Jianan Ye, Zhaorui Tan, Yijie Hu, Xi Yang, Guangliang Cheng, Kaizhu Huang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 20 tabular datasets show that our method substantially outperforms the state-of-the-art methods and leads to an average performance improvement of 6.1% on AUC-PR and 2.1% on AUC-ROC. Tables 1 and 2 present the AUC-PR and AUC-ROC results of our method alongside the competing methods across 20 datasets, respectively. |
| Researcher Affiliation | Academia | Jianan Ye1,2, Zhaorui Tan1,2, Yijie Hu1,2, Xi Yang1, Guangliang Cheng2, Kaizhu Huang3* 1 School of Advanced Technology, Xi an Jiaotong-Liverpool University 2 School of Electrical Engineering, Electronics and Computer Science, University of Liverpool 3 Data Science Research Center, Duke Kunshan University EMAIL |
| Pseudocode | No | The paper describes the methodology in detail in Section 3 but does not provide any explicitly labeled pseudocode or algorithm blocks. The figures illustrate the strategy and framework but do not contain code-like structures. |
| Open Source Code | Yes | Codes are available at https://github.com/yjnanan/Disent-AD. |
| Open Datasets | Yes | Our evaluation encompasses 20 tabular datasets, aligning with previous work (Yin et al. 2024). 12 of them are obtained from the Outlier Detection Datasets (ODDS) (Rayana 2016), while the remainder are derived from ADBench (Han et al. 2022) |
| Dataset Splits | Yes | We randomly sample 50% of the normal samples as the training set, and the remaining normal samples with all anomaly samples are combined into the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Deep OD (Xu et al. 2023a), an open-source Python library' and 'MCM is based on their official open-source code', and that parameters are optimized by Adam, but it does not specify version numbers for any programming languages or libraries (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | In our network architecture, a three-layer Multilayer Perceptron (MLP) with Leaky ReLU activation function forms the encoder, and the decoder is symmetrically designed to the encoder. For datasets applied with the preprocessing method, we set epochs to 200 and the channel number C of latent features to 512 for efficient convergence, while epochs to 100 and C to 128 for the rest. Due to the large variation in the number of samples between datasets, ranging from 129 to 299,285, we use different batch sizes for different datasets. The parameters of the network are optimized by Adam with a uniform learning rate of 1e-4 for all datasets. |