Rethinking Cancer Gene Identification Through Graph Anomaly Analysis

Authors: Yilong Zang, Lingfei Ren, Yue Li, Zhikang Wang, David Antony Selby, Zheng Wang, Sebastian Josef Vollmer, Hongzhi Yin, Jiangning Song, Junhang Wu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on two reprocessed datasets STRINGdb and CPDB, and the experimental results demonstrate the superiority of HIPGNN. Extensive experiments on these datasets demonstrate the superior performance of the proposed HIPGNN compared to state-of-the-art methods. Table 1 presents the results of HIPGNN and other baseline methods with training ratios of 20% and 80%. Ablation Analysis.
Researcher Affiliation Academia 1School of Hotel and Tourism Management, The Hong Kong Polytechnic University 2School of Computing and Artificial Intelligence, Southwest University of Finance and Economics 3School of Computer Science, Wuhan University 4Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University 5German Research Center for Artificial Intelligence (DFKI) and RPTU Kaiserslautern 6The University of Queensland 7College of Information Science and Technology, Shihezi University
Pseudocode No The paper describes methods and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and Data https://github.com/zyl199710/HIPGNN
Open Datasets Yes Building on previous works (Schulte-Sasse et al. 2021; Cui et al. 2023), we reprocessed two datasets, STRINGdb and CPDB, which contain real-world PPIs and cancer gene data, to extract more comprehensive protein interaction information. We name these two datasets directly after the PPI databases: STRINGdb and CPDB.
Dataset Splits Yes Table 1 presents the results of HIPGNN and other baseline methods with training ratios of 20% and 80%. The left subfigure of Figure 4 shows box plots of the results, where HIPGNN with only spectral graph representation still outperforms, highlighting the effectiveness of spectral eigenvalue encoding. The right subfigure of Figure 4, using a 20% training ratio and five-fold cross-validation, shows that both contexts improve HIPGNN s performance, with confidence context being particularly impactful.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts used for running experiments. It mentions 'large size of the graph' and 'computational complexity' but no hardware specifics.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks like PyTorch or TensorFlow versions.
Experiment Setup Yes Loss weights For the final loss computation, we used α, β, and γ to weight the protein interaction context, interaction confidence context, and cancer gene label loss, respectively. We empirically set α = 0.01 on STRINGdb and α = 0.02 on CPDB, with β = 2/3(1 α) and γ = 1/3(1 α). We evaluated HIPGNN s performance with q values ranging from 1,000 to 4,000.