reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Parameter Efficient Node Classification on Homophilic Graphs

Authors: Lucas Prieto, Jeroen Den Boef, Paul Groth, Joran Cornelisse

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose Graph Non-Parametric Diffusion (GNPD) a method that outperforms traditional GNNs using only 2 linear models and non-parameteric diffusion. Our method takes ideas from Correct & Smooth (C&S) and the Scalable Inception Graph Network (SIGN) and combines them to create a simpler model that outperforms both of them on several datasets. Our method achieves unmatched parameter efficiency, competing with models with two orders of magnitude more parameters. Additionally GNPD can also forego spectral embeddings which are the computational bottleneck of the C&S method.
Researcher Affiliation	Collaboration	Lucas Prieto EMAIL Socialdatabase University of Amsterdam Jeroen Den Boef EMAIL Socialdatabase Paul Groth EMAIL University of Amsterdam Joran Cornelisse EMAIL Socialdatabase
Pseudocode	No	The paper describes methods using text, mathematical equations (e.g., Equation 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) and schematic diagrams (Figure 1, 2, 3) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about the release of source code or a direct link to a code repository for the methodology described. It mentions "Reviewed on Open Review: https: // openreview. net/ forum? id= XXXX" but this is for review, not code access.
Open Datasets	Yes	The statistics of the datasets used in this paper are described in Table 1. Open graph benchmark: Datasets for machine learning on graphs. ar Xiv preprint ar Xiv:2005.00687, 2020.
Dataset Splits	Yes	Table 1: Summary statistics for the datasets used in this paper. Dataset Nodes Edges Classes Train/Val/Test arxiv 169,343 1,166,243 40 54%/18%/28% Products 2,449,029 61,859,140 47 10%/2%/88% Pubmed 19,717 44,338 3 92%/3%/5% Citeseer 3,327 4,732 6 55%/15%/30%
Hardware Specification	No	The paper discusses computational efficiency and runtimes, particularly in Section 5 and Figure 5, noting benefits from foregoing spectral embeddings. However, it does not specify any particular GPU, CPU, or other hardware details used for executing the experiments.
Software Dependencies	No	The paper mentions "Light GBM (Ke et al., 2017)" as a component in its aggregation step. However, it does not provide a specific version number for Light GBM or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	6.1 Ablation study In this section we measure the importance of the different components of our method. While most of the hyper-parameters in this method were inherited from C&S, we introduced the λ parameter to regulate class specific homophily; we show the sensitivity of our method to this hyper-parameter and the number of diffusion steps k in Table 5. Table 5: Sensitivity analysis with respect to λ and k (the number of diffusion steps). The results are averaged over 10 runs and shown with standard deviation.