A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection

Authors: Junjun Pan, Yixin Liu, Xin Zheng, Yizhen Zheng, Alan Wee-Chung Liew, Fuyi Li, Shirui Pan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 6 datasets demonstrate that HUGE significantly outperforms competitors, showcasing its effectiveness and robustness.
Researcher Affiliation Academia 1School of Information and Communication Technology, Griffith University, Queensland, Australia 2Faculty of Information Technology, Monash University, Melbourne, Australia 3Faculty of Health and Medical Sciences, The University of Adelaide, South Australia, Australia EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using mathematical equations and textual explanations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing code, a link to a code repository, or mention of code in supplementary materials for the described methodology.
Open Datasets Yes We conduct experiments on six public real-world GFD datasets, covering domains ranging from social networks to e-commerce, including Amazon (Dou et al. 2020), Facebook (Xu et al. 2022), Reddit, Yelp Chi (Kumar, Zhang, and Leskovec 2019), Amazon Full (Mc Auley and Leskovec 2013), and Yelp Chi Full (Rayana and Akoglu 2015).
Dataset Splits No The paper mentions using six real-world GFD datasets and reports average performance over five runs with different random seeds. However, it does not provide specific details on how these datasets were split into training, validation, and testing sets (e.g., exact percentages or sample counts) in the main text.
Hardware Specification Yes OOM indicates out-of-memory on a 24GB GPU.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes To evaluate the sensitivity of HUGE to the hyper-parameter α, we adjust its value across {0.0, 0.5, 1.0, 1.5, 2.0}. The results are illustrated in Figure 3(a). Overall, HUGE is not sensitive to variation in α on Facebook and Reddit, and slightly sensitive on Amazon and Yelp Chi. Moreover, HUGE achieves the best performance at α = 0.5, with a noticeable drop when α = 0 on Amazon and Facebook. These findings suggest HUGE performs best when neighbor information and ego information are balanced. Additionally, the small performance gap between α = 0 and α = 1 on Reddit and Yelp Chi shows that these datasets require less neighbor information for effective fraud detection.