reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Implicit vs Unfolded Graph Neural Networks

Authors: Yongyi Yang, Tang Liu, Yangkun Wang, Zengfeng Huang, David Wipf

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	in this paper we carefully quantify explicit situations where the solutions they produce are equivalent and others where their properties sharply diverge. This includes the analysis of convergence, representational capacity, and interpretability. In support of this analysis, we also provide empirical head-to-head comparisons across multiple synthetic and public real-world node classification benchmarks. These results indicate that while IGNN is substantially more memory-efficient, UGNN models support unique, integrated graph attention mechanisms and propagation rules that can achieve strong node classification accuracy across disparate regimes such as adversarially-perturbed graphs, graphs with heterophily, and graphs involving long-range dependencies.
Researcher Affiliation	Collaboration	Yongyi Yang EMAIL University of Michigan Ann Arbor, United States. Tang Liu EMAIL Alibaba Hangzhou, China. Yangkun Wang EMAIL University of California, San Diego La Jolla, United States. Zengfeng Huang EMAIL Fudan University Shanghai, China. David Wipf EMAIL Amazon Web Services Shanghai, China.
Pseudocode	No	The paper describes methods using mathematical equations and textual explanations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps.
Open Source Code	Yes	All models and testing were implemented using the Deep Graph Library (DGL) (Wang et al., 2019). We also note that portions of these experiments have appeared previously in our conference paper (Yang et al., 2021), which presents a useful UGNN-related modeling framework and supporting DGL-based code that we further augmented for our purposes herein. ... For IGNN, following the original paper, we optionally set f(X) = MLP(PX) or MLP(X), while for EIGNN and UGNN we only set f(X) = MLP(X). ... the official IGNN implementation12 executes propagation steps until either acceptable error rate or an upper threshold is reached. ... 12 https://github.com/Swiftie H/IGNN
Open Datasets	Yes	First, we train separate IGNN and UGNN models on the ogbn-arxiv dataset (Hu et al., 2020)... For this purpose we next turn to the Amazon Co-Purchase dataset, a node classification benchmark often used for evaluating long-range dependencies, in part because of sparse labels relative to graph size (Baker et al., 2023; Dai et al., 2018; Gu et al., 2020). ...We begin with the synthetic Chains node classification dataset (Gu et al., 2020; Liu et al., 2021a) explicitly tailored to evaluate GNN models w.r.t. long-range dependencies. ... we use several heterophily datasets, including Texas, Wisconsin, Actor and Cornell datasets which are introduced by (Pei et al., 2019), and Roman-empire, Amazon-ratings, Minesweeper, Tolokers and Questions which are introduced by (Platonov et al., 2023).
Dataset Splits	Yes	First, we train separate IGNN and UGNN models on the ogbn-arxiv dataset (Hu et al., 2020), where the architectures are equivalent with the exception of Wp and the number of propagation steps (see Appendix for all network and training details). Additionally, because UGNN requires symmetric propagation weights, a matching parameter count can be achieved by simply expanding the UGNN hidden dimension. Once trained, we then generate predictions from each model and treat them as new, ground-truth labels. ... We adopt the test setup from (Dai et al., 2018; Gu et al., 2020; Baker et al., 2023), and compare performance using different label ratios. ... For the Amazon Co-Purchase dataset provided by the IGNN repo (Gu et al., 2020), including the data-processing and evaluation code, in order to obtain a fair comparison. As for splitting, 10% of nodes are selected as the test set. Because there is no dev set, we directly report the test result of the last epoch. We also vary the fraction of training nodes from 5% to 9%. ... To test this hypothesis, we first train both IGNN and UGNN models on the four ASDinf datasets from Section 7.3.1, using the data splits, processed node features, and labels provided by (Pei et al., 2019). ... We use the Deep Robust library (Li et al., 2020b) and apply the exact same non-targeted attack setting as in (Zhang and Zitnik, 2020).
Hardware Specification	Yes	Results are shown in Figures 6 and 7 based on executing 100 steps of training and evaluation on a single Tesla T4 GPU.
Software Dependencies	No	The paper mentions software tools used like the 'Deep Graph Library (DGL)' and the 'Deep Robust library', but does not specify version numbers for these software components.
Experiment Setup	Yes	For generating models, we first train them using the original labels of the dataset by 500 steps. For UGNN, we fix the number of propagation steps is set to 2, and adopt (11) of UGNN. ... We set the hidden size to 32 in asymmetric case and 34 in symmetric case to ensure nearly the same number of parameters (we deliberately set the hidden size to this small to ensure the models do not overfit). ... we compare the models using graph data corrupted via the the Mettack algorithm (Z ugner and G unnemann, 2019). ... setting the edge perturbation rate to 20%, adopting the Meta-Self training strategy, and a GCN as the surrogate model.