reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Disentangling Tabular Data Towards Better One-Class Anomaly Detection

Authors: Jianan Ye, Zhaorui Tan, Yijie Hu, Xi Yang, Guangliang Cheng, Kaizhu Huang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 20 tabular datasets show that our method substantially outperforms the state-of-the-art methods and leads to an average performance improvement of 6.1% on AUC-PR and 2.1% on AUC-ROC. Tables 1 and 2 present the AUC-PR and AUC-ROC results of our method alongside the competing methods across 20 datasets, respectively.
Researcher Affiliation	Academia	Jianan Ye1,2, Zhaorui Tan1,2, Yijie Hu1,2, Xi Yang1, Guangliang Cheng2, Kaizhu Huang3* 1 School of Advanced Technology, Xi an Jiaotong-Liverpool University 2 School of Electrical Engineering, Electronics and Computer Science, University of Liverpool 3 Data Science Research Center, Duke Kunshan University EMAIL
Pseudocode	No	The paper describes the methodology in detail in Section 3 but does not provide any explicitly labeled pseudocode or algorithm blocks. The figures illustrate the strategy and framework but do not contain code-like structures.
Open Source Code	Yes	Codes are available at https://github.com/yjnanan/Disent-AD.
Open Datasets	Yes	Our evaluation encompasses 20 tabular datasets, aligning with previous work (Yin et al. 2024). 12 of them are obtained from the Outlier Detection Datasets (ODDS) (Rayana 2016), while the remainder are derived from ADBench (Han et al. 2022)
Dataset Splits	Yes	We randomly sample 50% of the normal samples as the training set, and the remaining normal samples with all anomaly samples are combined into the test set.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'Deep OD (Xu et al. 2023a), an open-source Python library' and 'MCM is based on their official open-source code', and that parameters are optimized by Adam, but it does not specify version numbers for any programming languages or libraries (e.g., Python version, PyTorch version).
Experiment Setup	Yes	In our network architecture, a three-layer Multilayer Perceptron (MLP) with Leaky ReLU activation function forms the encoder, and the decoder is symmetrically designed to the encoder. For datasets applied with the preprocessing method, we set epochs to 200 and the channel number C of latent features to 512 for efficient convergence, while epochs to 100 and C to 128 for the rest. Due to the large variation in the number of samples between datasets, ranging from 129 to 299,285, we use different batch sizes for different datasets. The parameters of the network are optimized by Adam with a uniform learning rate of 1e-4 for all datasets.