reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection

Authors: Junjun Pan, Yixin Liu, Xin Zheng, Yizhen Zheng, Alan Wee-Chung Liew, Fuyi Li, Shirui Pan

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 6 datasets demonstrate that HUGE significantly outperforms competitors, showcasing its effectiveness and robustness.
Researcher Affiliation	Academia	1School of Information and Communication Technology, Griffith University, Queensland, Australia 2Faculty of Information Technology, Monash University, Melbourne, Australia 3Faculty of Health and Medical Sciences, The University of Adelaide, South Australia, Australia EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods using mathematical equations and textual explanations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing code, a link to a code repository, or mention of code in supplementary materials for the described methodology.
Open Datasets	Yes	We conduct experiments on six public real-world GFD datasets, covering domains ranging from social networks to e-commerce, including Amazon (Dou et al. 2020), Facebook (Xu et al. 2022), Reddit, Yelp Chi (Kumar, Zhang, and Leskovec 2019), Amazon Full (Mc Auley and Leskovec 2013), and Yelp Chi Full (Rayana and Akoglu 2015).
Dataset Splits	No	The paper mentions using six real-world GFD datasets and reports average performance over five runs with different random seeds. However, it does not provide specific details on how these datasets were split into training, validation, and testing sets (e.g., exact percentages or sample counts) in the main text.
Hardware Specification	Yes	OOM indicates out-of-memory on a 24GB GPU.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers.
Experiment Setup	Yes	To evaluate the sensitivity of HUGE to the hyper-parameter α, we adjust its value across {0.0, 0.5, 1.0, 1.5, 2.0}. The results are illustrated in Figure 3(a). Overall, HUGE is not sensitive to variation in α on Facebook and Reddit, and slightly sensitive on Amazon and Yelp Chi. Moreover, HUGE achieves the best performance at α = 0.5, with a noticeable drop when α = 0 on Amazon and Facebook. These findings suggest HUGE performs best when neighbor information and ego information are balanced. Additionally, the small performance gap between α = 0 and α = 1 on Reddit and Yelp Chi shows that these datasets require less neighbor information for effective fraud detection.