Online GNN Evaluation Under Test-time Graph Distribution Shifts
Authors: Xin Zheng, Dongjin Song, Qingsong Wen, Bo Du, Shirui Pan
ICLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on real-world test graphs under diverse graph distribution shifts could verify the effectiveness of the proposed method, revealing its strong correlation with ground-truth test errors on various well-trained GNN models. |
| Researcher Affiliation | Collaboration | Xin Zheng Monash University Melbourne, Australia EMAIL Dongjin Song University of Connecticut Storrs, USA EMAIL Qingsong Wen Squirrel AI Bellevue, USA EMAIL Bo Du Wuhan University Wuhan, China EMAIL Shirui Pan Griffith University Queensland, Australia EMAIL |
| Pseudocode | Yes | Algorithm 1 Learning Behavior Discrepancy (LEBED) Score Computation. |
| Open Source Code | Yes | 1Code is available at https://github.com/Amanda-Zheng/LEBED |
| Open Datasets | Yes | We perform experiments on six real-world graph datasets with diverse graph data distribution shifts containing: node feature shifts (Wu et al., 2022; Jin et al., 2023b)), domain shifts (Wu et al., 2020), temporal shifts (Wu et al., 2022). Detailed statistics of all these datasets are listed in Table A1 in Appendix B. |
| Dataset Splits | Yes | For all training graphs and validation graphs, we follow the process procedures and splits in works (Wu et al., 2022) and (Wu et al., 2020). |
| Hardware Specification | Yes | The running time comparison on Citationv2 in seconds is shown in Fig. 3 with a single Ge Force RTX 3080 GPU and 200 iterations for w/ Dstru.. |
| Software Dependencies | No | In our experiments, we use Pytorch geometric library (Fey & Lenssen, 2019) and four Ge Force RTX 3080 GPUs for all implementations. However, specific version numbers for the software dependencies are not provided. |
| Experiment Setup | Yes | More details of these well-trained GNN models, including architectures, training hyper-parameters, and groundtruth test error distributions, are provided in Appendix D. We report the correlation between the proposed LEBED and the ground-truth test errors under unseen and unlabeled test graphs with distribution shifts, using R2 and rank correlation Spearman s ρ, where R2 ranges [0, 1], representing the degree of linear fit between two variables. The closer it is to 1, the higher the linear correlation. Spearman s ρ ranges [ 1, 1], representing the monotonic correlation between two variables with 1 indicating the positive correlation and 1 indicating the negative correlation. |