Mind the Gap: Mitigating the Distribution Gap in Graph Few-shot Learning
Authors: Chunhui Zhang, Hongfu Liu, Jundong Li, Yanfang Ye, Chuxu Zhang
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments, we demonstrate that SDGFL outperforms state-of-the-art baselines on various graph mining tasks across multiple datasets in the few-shot scenario. We also provide a quantitative measurement of SDGFL s superior performance in comparison to existing methods. |
| Researcher Affiliation | Academia | Chunhui Zhang EMAIL Brandeis University, MA, USA; Hongfu Liu EMAIL Brandeis University, MA, USA; Jundong Li EMAIL University of Virginia, VA, USA; Yanfang Ye EMAIL University of Notre Dame, IN, USA; Chuxu Zhang EMAIL Brandeis University, MA, USA |
| Pseudocode | No | The paper describes the methodology in Section 4 'Methodology' and illustrates it with Figure 1, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references code for various baseline methods using links in footnotes in Appendix B. However, it does not provide any statement or link for the source code of their proposed SDGFL methodology. |
| Open Datasets | Yes | Specifically, for the node classification task, we used ogbn-arxiv (Hu et al., 2020a), Tissue-PPI (Hamilton et al., 2017), Fold-PPI (Zitnik & Leskovec, 2017), Cora (Sen et al., 2008), and Citeseer (Sen et al., 2008). For the graph classification task, we used the datasets in (Chauhan et al., 2020), namely, Letter-High, Triangles, Reddit-12K, and Enzymes. Appendix A provides more detailed information about the datasets. |
| Dataset Splits | Yes | For the graph classification task... The statistics of datasets are reported in Table 6. Dataset Class # Graph # Train Test Training Validation Test Letter-High 11 4 1,330 320 600... In the inductive setting (SDGCL-I), we only use the unlabeled data in the training set, whereas in the transductive setting (SDGCL-T), we use the unlabeled data in both the training and testing sets. |
| Hardware Specification | Yes | Our SDGCL implementation is based on Py Torch, and we train it on NVIDIA V100 GPUs. |
| Software Dependencies | No | Our SDGCL implementation is based on Py Torch, and we train it on NVIDIA V100 GPUs. (PyTorch is mentioned but no version number is provided, and no other specific software dependencies with versions are listed for the authors' implementation). |
| Experiment Setup | Yes | To augment graph data, we randomly drop 15% of the nodes, remove 15% of the edges, and mask 20% of the node features. The mini-batch size is set to 2,048, and we use a learning rate of 0.05 with a decay factor of 0.9. Furthermore, we set the τ value for exponential moving average to 0.999. |