UniFORM: Towards Unified Framework for Anomaly Detection on Graphs

Authors: Chuancheng Song, Xixun Lin, Hanyang Shen, Yanmin Shang, Yanan Cao

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on real-world datasets demonstrate that Uni FORM significantly outperforms stateof-the-art methods across multiple granularities.
Researcher Affiliation Academia Chuancheng Song1,2, Xixun Lin1,2, Hanyang Shen1,2, Yanmin Shang1,2, Yanan Cao1,2* 1Institute of Information Engineering, Chinese Academy of Sciences 2 School of Cyber Security, University of Chinese Academy of Sciences EMAIL
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. Methodologies are described in paragraph text and mathematical formulations.
Open Source Code No The paper does not explicitly state that source code is provided or offer any links to a code repository.
Open Datasets Yes We conduct experiments using datasets from three distinct domains: Research Networks (Cora, Pubmed, COLLAB), Social Networks (Blog Catalog, Flickr, Enron, IMDB, Reddit), and Commercial Networks (Yelp, IBMAML).
Dataset Splits No The paper mentions using both ground-truth and artificially injected anomalies following (Liu et al. 2021) and evaluates using AUC, but it does not provide specific details on the training/validation/test splits (e.g., percentages or exact counts) for any of the datasets.
Hardware Specification Yes All models were run on Python 3.9.19, NVIDIA Tesla V100 GPU, 629GB RAM, and 2.20GHz Intel Xeon E5-2650 CPU.
Software Dependencies Yes All models were run on Python 3.9.19, NVIDIA Tesla V100 GPU, 629GB RAM, and 2.20GHz Intel Xeon E5-2650 CPU.
Experiment Setup Yes For efficiency and performance, we fixed the sampled community size c (central component plus c 1 hop neighbors for egograph, and random walk steps for subgraph fragments) to 4. For isolated nodes or those in smaller communities, nodes are repeatedly sampled until an overlapping community of the desired size is formed. In Langevin Dynamics, we select ϵ = 0.3 and T = 25, justified below. The energy-based GNN uses 2 layers (K = 2) to extract information from small communities, with an embedding dimension f = 64. Batch size is set to 300 for each dataset. All models are optimized using the Adam optimizer. Training epochs are 200 for Cora, Pubmed, Blog Catalog, and Flickr; 400 for Enron, IMDB, and Reddit; and 600 for COLLAB, Yelp, and IBMAML. Learning rates are 0.001 for Cora, Pubmed, Blog Catalog, and Flickr, and 0.003 for the others.