Towards a Unified Framework of Clustering-based Anomaly Detection
Authors: Zeyu Fang, Ming Gu, Sheng Zhou, Jiawei Chen, Qiaoyu Tan, Haishuai Wang, Jiajun Bu
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments, involving 17 baseline methods across 30 diverse datasets, validate the effectiveness and generalization capability of the proposed method, surpassing state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University, Hangzhou, China 2The State Key Laboratory of Blockchain and Data Security, Hangzhou, China 3New York University Shanghai, Shanghai, China. Correspondence to: Sheng Zhou <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Model Training for Uni CAD |
| Open Source Code | Yes | The code for reproducing our experiments is publicly available at https://github.com/BabelTower/Uni CAD. |
| Open Datasets | Yes | We evaluated Uni CAD on an extensive collection of datasets, comprising 30 tabular datasets that span 16 diverse fields. We specifically focused on naturally occurring anomaly patterns, rather than synthetically generated or injected anomalies, as this aligns more closely with real-world scenarios. The detailed descriptions are provided in Table 4 of Appendix D.1. Following the setup in ADBench (Han et al., 2022), we adopt an inductive setting to predict newly emerging data, a highly beneficial approach for practical applications. |
| Dataset Splits | No | The paper mentions using an "inductive setting to predict newly emerging data" and evaluation using "AUC-ROC and AUC-PR metrics" following the setup in ADBench (Han et al., 2022). However, it does not explicitly provide specific percentages, counts, or a detailed methodology for how the training, validation, and test splits were performed for the 30 datasets used in its experiments. |
| Hardware Specification | No | The paper provides a "Runtime Comparison" in Table 2 but does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments or training the models. |
| Software Dependencies | No | The paper mentions using the "Adam optimizer" and states that a "two-layer MLP" was employed, but it does not specify any software frameworks (e.g., PyTorch, TensorFlow) or library version numbers (e.g., Python 3.x, scikit-learn x.x.x). |
| Experiment Setup | Yes | For all datasets, we employ a two-layer MLP with a hidden dimension of d = 128 and Re LU activation function as both encoder and decoder. We utilize the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1e 4 for 100 epochs. For the EM process, we set the maximum iteration number to 100 and a tolerance of 1e 3 for stopping training when the objectives converge. The number of components in the mixture model is set as k = 10, and the proportion of the outlier is set as l = 1%. |