reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Test-Time Graph Neural Dataset Search With Generative Projection

Authors: Xin Zheng, Wei Huang, Chuan Zhou, Ming Li, Shirui Pan

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world graphs demonstrate the superior ability of our proposed PGNDS for test-time GNN inference. Comprehensive Experiments. We evaluate the proposed PGNDS on real-world graph datasets, and extensive experimental results demonstrate its superior test-time adaptation ability for inference on well-trained GNN models. 3. Experiments We verify the effectiveness of the proposed PGNDS in terms of the test-time inference performance. Concretely, we aim to answer the following questions: Q1: How does the proposed PGNDS perform on the well-trained GNN for both graph classification and regression tasks when faced with unknown graph distribution shifts at test time? Q2: How does the proposed PGNDS perform in ablation studies focusing on each components? Q3: How sensitive is the proposed PGNDS to variations in hyper-parameters? Q4: How does the proposed PGNDS perform in terms of running time efficiency? Table 1. ROC-AUC performance ( ) comparison between baseline methods and our proposed PGNDS on molecular and protein graphs for the graph classification task. Table 2. RMSE performance ( ) comparison between baseline methods and our proposed PGNDS on molecular graphs for the graph regression task.
Researcher Affiliation	Academia	1School of Information and Communication Technology, Griffith University, Gold Coast, Australia. 2RIKEN Center for Advanced Intelligence Project, Tokyo, Japan. 3Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. 4Zhejiang Institute of Optoelectronics, Jinhua, China; Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University, Jinhua, China.
Pseudocode	No	The paper describes the methodology in detailed text, including equations for the diffusion process and rectification functions. It outlines the three key modules (dual conditional diffusion, dynamic search, ensemble inference) and their steps. However, there are no explicitly labeled pseudocode blocks or algorithms with structured steps formatted like code.
Open Source Code	No	The paper does not provide an explicit statement about releasing the code for the described methodology, nor does it include a link to a code repository. It mentions other methods lacking publicly available code but does not provide its own.
Open Datasets	Yes	Datasets & Metrics. We perform experiments on six realworld graph datasets covering protein and molecular graphs, with four graph classification tasks and five graph regression tasks. We use the area under the ROC curve (ROCAUC) to evaluate the graph classification task and root mean square error (RMSE) for the graph regression task. Higher ROC-AUC ( ) and lower RMSE ( ) indicate better graph learning performance. More details of datasets are listed in Appendix B. Note that the original QM9 dataset contains nineteen tasks, but we selected four of them (i.e., A, B, C, and alpha) for our experiments. For all training and test graphs, we follow the process procedures and splits in previous works (Liu et al., 2024; Jo et al., 2022). Table A2. Dataset statistics for graph classification and regression on protein and molecular graphs.
Dataset Splits	Yes	More details of datasets are listed in Appendix B. Note that the original QM9 dataset contains nineteen tasks, but we selected four of them (i.e., A, B, C, and alpha) for our experiments. For all training and test graphs, we follow the process procedures and splits in previous works (Liu et al., 2024; Jo et al., 2022). Table A2. Dataset statistics for graph classification and regression on protein and molecular graphs. # Train / Test Protein Enzymes Classification 587 6 33.0 / 125 63.2 / 149 470 / 117 Ogbg-BACE Classification 1,513 1 34.1 / 97 73.7 / 202 1210 / 152 Ogbg-BBBP Classification 2,039 1 24.1 / 132 51.9 / 290 1631 / 204 Ogbg-Clin Tox Classification 1,477 2 26.2 / 136 55.8 / 286 1181/ 148 Ogbg-Free Solv Regression 642 1 8.7 / 24 16.8 / 50 513 / 65 QM9 Regression 133,885 4 8.8 / 9 9.4 /13 120803 / 13082
Hardware Specification	Yes	Table 4. Running time (in seconds) comparison on graph classification task in 5 epochs with a single NVIDIA A100 GPU.
Software Dependencies	No	The paper does not explicitly state any specific software dependencies with version numbers for their implementation. It mentions using "the classic GIN model (Xu et al., 2018)" but without specific versioning for GIN or its underlying libraries.
Experiment Setup	No	The paper discusses hyper-parameters for controlling conditional constraints (α, β, and γ) and provides a hyper-parameter sensitivity analysis (Fig. 5, Fig. A1). However, it does not explicitly list the learning rate, batch size, optimizer type, number of epochs, or other system-level training parameters for the GNN model or the diffusion model. It mentions that "the well-trained GNN model can be denotes as GNNθ tr with optimal weight parameters θ tr" and refers to "training procedures for both the GNN model and the graph generative model (i.e., diffusion model) as essential preliminary steps" without detailing these setups.