Open-Set Graph Anomaly Detection via Normal Structure Regularisation
Authors: Qizhou Wang, Guansong Pang, Mahsa Salehi, Xiaokun Xia, Christopher Leckie
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical results on seven real-world datasets show that NSReg significantly outperforms state-of-the-art competing methods by at least 14% AUC-ROC on the unseen anomaly classes and by 10% AUC-ROC on all anomaly classes. |
| Researcher Affiliation | Academia | Qizhou Wang1 , Guansong Pang2 , Mahsa Salehi3, Xiaokun Xia4, Christopher Leckie1 1The University of Melbourne 2Singapore Management University 3Monash University 4The University of Tokyo EMAIL, EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | A Python-style pseudocode for training is provided in the Appendix B.1. Algorithm 1 Training NSReg |
| Open Source Code | Yes | Code and datasets are available at https://github.com/mala-lab/NSReg. |
| Open Datasets | Yes | Extensive empirical results on seven real-world datasets show that NSReg significantly outperforms state-of-the-art competing methods by at least 14% AUC-ROC on the unseen anomaly classes and by 10% AUC-ROC on all anomaly classes. ... For imbalanced node classification datasets with multiple minor classes, such as Photo, Computers, and CS (Shchur et al., 2018)... In the case of Yelp (Rayana & Akoglu, 2015) and T-Finance (Tang et al., 2022)... Three large-scale attributed graph datasets, ogbn-arxiv, ogbnproteins (Hu et al., 2020), and T-Finance (Tang et al., 2022) are also adapted to evaluate NSReg at scale. More details about the datasets are presented in Appendix B.2. |
| Dataset Splits | Yes | For each dataset, we treat one of the anomaly classes as the seen anomaly, with the other anomaly classes as unseen anomalies. We alternate this process for all anomaly classes, and report the results averaged over all cases. All experiments are repeated for 5 random runs. ... In our default setting, 50 anomalies are used for training along with 5% of randomly selected normal nodes, while the remaining data is reserved for evaluation. ... The Python style code for the evaluation protocol is presented in Appendix B.3. Algorithm 2 Open-set GAD Experimental Protocol |
| Hardware Specification | Yes | Our experiments are conducted using a single NVIDIA A100 GPU and 28 CPU cores from an AMD EPYC 7663 Processor on a HPC cluster. |
| Software Dependencies | Yes | NSReg is implemented in Python and makes extensive use of Pytorch (Paszke et al., 2019) and Pytorch Geometric (Fey & Lenssen, 2019). We summarise the main scientific computing libraries and their versions used for our implementation as follows: python==3.8.13 pytorch==1.10.1 (py3.8 cuda11.3 cudnn8.2.0 0) pytorch geometric==2.0.4 (py38 torch 1.10.0 cu113) numpy==1.23.5 scipy==1.10.0 scikit-learn==1.2.1 cudatoolkit==11.3.1 dgl==1.0.2 |
| Experiment Setup | Yes | NSReg is optimised using the Adam (Kingma & Ba, 2014) optimiser for 200 epochs for the Photo, Computers, and CS datasets with a learning rate of 1e-3, and for 400 epochs for Yelp with a learning rate of 5e-3 due to its larger number of nodes. We set the number of labelled anomalies to 50 and the percentage of labelled normal nodes to 5% by default. Each batch of training nodes includes all labelled anomalies and randomly sampled number of labelled normal nodes capped at 512. Similarly, each batch of relations is capped at 512, with an equal number of samples for each relation type. The default value of α is set to 0.8 and λ is set to 1 for all datasets. |