Simplifying Node Classification on Heterophilous Graphs with Compatible Label Propagation

Authors: Zhiqiang Zhong, Sergei Ivanov, Jun Pang

TMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a wide variety of benchmarks, we show that our approach achieves the leading performance on graphs with various levels of homophily. Meanwhile, it has orders of magnitude fewer parameters and requires less execution time. Empirically, extensive experimental results on a wide variety of benchmarks show the competitive and efficient performance of CLP. Section 6 is titled 'Experiments' and contains subsections for 'Datasets', 'Experimental Setup', 'Results on Real-world Graphs', 'Results on Synthetic Graphs', and 'Additional Analysis', all of which discuss empirical results and comparisons.
Researcher Affiliation Collaboration Zhiqiang Zhong EMAIL University of Luxembourg Sergey Ivanov EMAIL Criteo Jun Pang EMAIL University of Luxembourg
Pseudocode No The paper describes the steps of the CLP model in Section 5 ('Compatible Label Propagation with Heterophily') and illustrates them conceptually in Figure 2, but it does not present a formal pseudocode block or algorithm listing. For example, 'Our approach starts with a simple base predictor on raw node features... After, we propose an approach to estimate the compatibility matrix H... Finally, we use label propagation algorithm...' are descriptive rather than algorithmic.
Open Source Code Yes Our implementation is available at https://github.com/zhiqiangzhongddu/TMLR-CLP.
Open Datasets Yes We use a total of 19 real-world datasets (Texas, Wisconsin, Actor, Squirrel, Chameleon, USA-Airports, Brazil-Airports, Wiki, Cornell, Europe-Airports, deezer-europe, Twitch-EN, Twitch-RU, Ogbn-Proteins, Wiki CS, DBLP, CS, ACM, Physics) in diverse domains... See Appendix C for detailed descriptions, statistics and references. For datasets with real-world contextual node features, we first establish a class mapping ψ : Y Yb between classes in the synthetic graph Y to classes of existing benchmark graph Yb... In this paper, we adopt the large-scale benchmark, Ogbn-Products (Hu et al., 2020).
Dataset Splits Yes We consider three different choices for the random split into training/validation/test settings, which we call sparse splittings (5%/5%/90%), medium splitting (10%/10%/80%) and dense splitting (48%/32%/20%), respectively. The sparse splitting (5%/5%/90%) is similar to the original semisupervised setting in Kipf & Welling (2017), but we do not restrict each class to have the same number of training instances since it is the case closer to the real-world application. For a fair comparison, we generate 10 fixed split instances with different splitting and results are summarised after 10 runs with random seeds. Note that the Ogbn-Proteins dataset adopts its default splitting settings.
Hardware Specification No The paper discusses execution time and model size, stating that CLP 'requires less execution time' and showing execution times in Figure 8. However, it does not specify the hardware (e.g., GPU model, CPU type) on which these experiments were run. No specific hardware details are provided in the main text.
Software Dependencies No The paper mentions that 'Our implementation is available at https://github.com/zhiqiangzhongddu/TMLR-CLP.' However, the main text of the paper does not explicitly list any specific software dependencies or their version numbers (e.g., Python version, PyTorch version, CUDA version, specific libraries with versions).
Experiment Setup No The paper states: 'Other model setups and hyperparameter settings can be found in Appendix E.' This indicates that specific experimental setup details, such as concrete hyperparameter values or detailed training configurations, are not provided within the main text of the paper.