Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

Authors: Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Rui Miao, Kaize Ding, Ying Wang, Shirui Pan, Xin Wang

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To bridge the gap, in this work, we present a Unified Benchmark for unsupervised Graph-level OOD and anoma Ly Detection (UB-GOLD), a comprehensive evaluation framework that unifies GLAD and GLOD under the concept of generalized graph-level OOD detection. Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 18 representative GLAD/GLOD methods. We conduct multi-dimensional analyses to explore the effectiveness, OOD sensitivity spectrum, robustness, and efficiency of existing methods, shedding light on their strengths and limitations.
Researcher Affiliation Academia 1Jilin University, 2Griffith University, 3Northwestern University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes various algorithms (Graph kernel with detector, Self-supervised learning with detector, GNN-based GLAD methods, GNN-based GLOD methods) in Section 3.2 and Appendix C, but it does not provide any structured pseudocode or algorithm blocks. The methods are described in paragraph form or as categorized lists.
Open Source Code Yes Furthermore, we provide an open-source codebase (https://github.com/UB-GOLD/UB-GOLD) of UB-GOLD to foster reproducible research and outline potential directions for future investigations based on our insights.
Open Datasets Yes Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 18 representative GLAD/GLOD methods. ... Our datasets are publicly available and include TUDataset, OGB, TOX21, Drug OOD, and GOOD. Among them, TUDataset (Morris et al., 2020), OGB (Hu et al., 2020), and TOX21 (Abdelaziz et al., 2016) are licensed under the MIT License. Drug OOD (Ji et al., 2023) is licensed under the GNU General Public License 3.0. GOOD (Gui et al., 2022) is licensed under GPL-3.0.
Dataset Splits Yes Data split. In our target scenarios (i.e., unsupervised GLAD/GLOD), all the samples in the training set are normal/ID, while the anomaly/OOD samples only occur in the testing set. In such an unsupervised case, the validation set with anomaly/OOD samples is usually unavailable during the training phase. Thus, following the implementation of Open OOD (Zhang et al., 2023), we divide the datasets into training and testing sets, without using a validation set. Specifically, we adopted the splits from (Liu et al., 2023a) and (Li et al., 2022), applying them to the benchmark datasets. Detailed splits are provided in Table 1.
Hardware Specification Yes All our experiments were carried out on a Linux server with an Intel(R) Xeon(R) Gold 5120 2.20GHz CPU, 160GB RAM, and NVIDIA A40 GPU, 48GB RAM.
Software Dependencies Yes This toolkit is built on top of Pytorch 2.01 (Paszke et al., 2019), torch_geometric 2.4.0 (Fey & Lenssen, 2019) and DGL 2.1.0 (Wang et al., 2019). We implement graph kernel methods with the DGL library. All other models are unified using the torch_geometric library. GCL and IG are included via the PYGCL library (Zhu et al., 2021).
Experiment Setup Yes Hyperparameter search. To obtain the performance upper bounds of various methods on GLAD/GLOD tasks, we conduct a random search to find the optimal hyperparameters w.r.t. their performance on the testing set. The search space is detailed in Table 4. The random search is conducted 20 times or for a maximum of one day per method per dataset to ensure fairness.