reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethinking Cancer Gene Identification Through Graph Anomaly Analysis

Authors: Yilong Zang, Lingfei Ren, Yue Li, Zhikang Wang, David Antony Selby, Zheng Wang, Sebastian Josef Vollmer, Hongzhi Yin, Jiangning Song, Junhang Wu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on two reprocessed datasets STRINGdb and CPDB, and the experimental results demonstrate the superiority of HIPGNN. Extensive experiments on these datasets demonstrate the superior performance of the proposed HIPGNN compared to state-of-the-art methods. Table 1 presents the results of HIPGNN and other baseline methods with training ratios of 20% and 80%. Ablation Analysis.
Researcher Affiliation	Academia	1School of Hotel and Tourism Management, The Hong Kong Polytechnic University 2School of Computing and Artificial Intelligence, Southwest University of Finance and Economics 3School of Computer Science, Wuhan University 4Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University 5German Research Center for Artificial Intelligence (DFKI) and RPTU Kaiserslautern 6The University of Queensland 7College of Information Science and Technology, Shihezi University
Pseudocode	No	The paper describes methods and mathematical formulations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and Data https://github.com/zyl199710/HIPGNN
Open Datasets	Yes	Building on previous works (Schulte-Sasse et al. 2021; Cui et al. 2023), we reprocessed two datasets, STRINGdb and CPDB, which contain real-world PPIs and cancer gene data, to extract more comprehensive protein interaction information. We name these two datasets directly after the PPI databases: STRINGdb and CPDB.
Dataset Splits	Yes	Table 1 presents the results of HIPGNN and other baseline methods with training ratios of 20% and 80%. The left subfigure of Figure 4 shows box plots of the results, where HIPGNN with only spectral graph representation still outperforms, highlighting the effectiveness of spectral eigenvalue encoding. The right subfigure of Figure 4, using a 20% training ratio and five-fold cross-validation, shows that both contexts improve HIPGNN s performance, with confidence context being particularly impactful.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts used for running experiments. It mentions 'large size of the graph' and 'computational complexity' but no hardware specifics.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks like PyTorch or TensorFlow versions.
Experiment Setup	Yes	Loss weights For the final loss computation, we used α, β, and γ to weight the protein interaction context, interaction confidence context, and cancer gene label loss, respectively. We empirically set α = 0.01 on STRINGdb and α = 0.02 on CPDB, with β = 2/3(1 α) and γ = 1/3(1 α). We evaluated HIPGNN s performance with q values ranging from 1,000 to 4,000.