Mind the Gap: Mitigating the Distribution Gap in Graph Few-shot Learning

Authors: Chunhui Zhang, Hongfu Liu, Jundong Li, Yanfang Ye, Chuxu Zhang

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments, we demonstrate that SDGFL outperforms state-of-the-art baselines on various graph mining tasks across multiple datasets in the few-shot scenario. We also provide a quantitative measurement of SDGFL s superior performance in comparison to existing methods.
Researcher Affiliation Academia Chunhui Zhang EMAIL Brandeis University, MA, USA; Hongfu Liu EMAIL Brandeis University, MA, USA; Jundong Li EMAIL University of Virginia, VA, USA; Yanfang Ye EMAIL University of Notre Dame, IN, USA; Chuxu Zhang EMAIL Brandeis University, MA, USA
Pseudocode No The paper describes the methodology in Section 4 'Methodology' and illustrates it with Figure 1, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper references code for various baseline methods using links in footnotes in Appendix B. However, it does not provide any statement or link for the source code of their proposed SDGFL methodology.
Open Datasets Yes Specifically, for the node classification task, we used ogbn-arxiv (Hu et al., 2020a), Tissue-PPI (Hamilton et al., 2017), Fold-PPI (Zitnik & Leskovec, 2017), Cora (Sen et al., 2008), and Citeseer (Sen et al., 2008). For the graph classification task, we used the datasets in (Chauhan et al., 2020), namely, Letter-High, Triangles, Reddit-12K, and Enzymes. Appendix A provides more detailed information about the datasets.
Dataset Splits Yes For the graph classification task... The statistics of datasets are reported in Table 6. Dataset Class # Graph # Train Test Training Validation Test Letter-High 11 4 1,330 320 600... In the inductive setting (SDGCL-I), we only use the unlabeled data in the training set, whereas in the transductive setting (SDGCL-T), we use the unlabeled data in both the training and testing sets.
Hardware Specification Yes Our SDGCL implementation is based on Py Torch, and we train it on NVIDIA V100 GPUs.
Software Dependencies No Our SDGCL implementation is based on Py Torch, and we train it on NVIDIA V100 GPUs. (PyTorch is mentioned but no version number is provided, and no other specific software dependencies with versions are listed for the authors' implementation).
Experiment Setup Yes To augment graph data, we randomly drop 15% of the nodes, remove 15% of the edges, and mask 20% of the node features. The mini-batch size is set to 2,048, and we use a learning rate of 0.05 with a decay factor of 0.9. Furthermore, we set the τ value for exponential moving average to 0.999.