reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mind the Gap: Mitigating the Distribution Gap in Graph Few-shot Learning

Authors: Chunhui Zhang, Hongfu Liu, Jundong Li, Yanfang Ye, Chuxu Zhang

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive experiments, we demonstrate that SDGFL outperforms state-of-the-art baselines on various graph mining tasks across multiple datasets in the few-shot scenario. We also provide a quantitative measurement of SDGFL s superior performance in comparison to existing methods.
Researcher Affiliation	Academia	Chunhui Zhang EMAIL Brandeis University, MA, USA; Hongfu Liu EMAIL Brandeis University, MA, USA; Jundong Li EMAIL University of Virginia, VA, USA; Yanfang Ye EMAIL University of Notre Dame, IN, USA; Chuxu Zhang EMAIL Brandeis University, MA, USA
Pseudocode	No	The paper describes the methodology in Section 4 'Methodology' and illustrates it with Figure 1, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper references code for various baseline methods using links in footnotes in Appendix B. However, it does not provide any statement or link for the source code of their proposed SDGFL methodology.
Open Datasets	Yes	Specifically, for the node classification task, we used ogbn-arxiv (Hu et al., 2020a), Tissue-PPI (Hamilton et al., 2017), Fold-PPI (Zitnik & Leskovec, 2017), Cora (Sen et al., 2008), and Citeseer (Sen et al., 2008). For the graph classification task, we used the datasets in (Chauhan et al., 2020), namely, Letter-High, Triangles, Reddit-12K, and Enzymes. Appendix A provides more detailed information about the datasets.
Dataset Splits	Yes	For the graph classification task... The statistics of datasets are reported in Table 6. Dataset Class # Graph # Train Test Training Validation Test Letter-High 11 4 1,330 320 600... In the inductive setting (SDGCL-I), we only use the unlabeled data in the training set, whereas in the transductive setting (SDGCL-T), we use the unlabeled data in both the training and testing sets.
Hardware Specification	Yes	Our SDGCL implementation is based on Py Torch, and we train it on NVIDIA V100 GPUs.
Software Dependencies	No	Our SDGCL implementation is based on Py Torch, and we train it on NVIDIA V100 GPUs. (PyTorch is mentioned but no version number is provided, and no other specific software dependencies with versions are listed for the authors' implementation).
Experiment Setup	Yes	To augment graph data, we randomly drop 15% of the nodes, remove 15% of the edges, and mask 20% of the node features. The mini-batch size is set to 2,048, and we use a learning rate of 0.05 with a decay factor of 0.9. Furthermore, we set the τ value for exponential moving average to 0.999.