reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Understanding Virtual Nodes: Oversquashing and Node Heterogeneity

Authors: Joshua Southern, Francesco Di Giovanni, Michael Bronstein, Johannes Lutzeyer

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, in Section 5, we first validate our theoretical analysis through extensive ablations and experiments. Next, we evaluate MPNN + VNG and show that it consistently surpasses the baseline MPNN + VN, precisely on those tasks where node heterogeneity matters.
Researcher Affiliation	Collaboration	1Imperial College London 2University of Oxford 3AITHYRA 4LIX, École Polytechnique, IP Paris
Pseudocode	No	The paper provides mathematical equations and model definitions (e.g., equations 1-3, 8-16) but does not include any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide links to source code repositories.
Open Datasets	Yes	We evaluated MPNN + VNG on the Long-Range Graph Benchmark using a fixed 500k parameter budget and averaging over four runs. These molecular datasets (Peptides-Func, Peptides-Struct) have been proposed to test a method s ability to capture long-range dependencies. Additionally, we used two graph-level image-based datasets from Benchmarking GNNs (Dwivedi et al., 2023), where we run our model over 10 seeds. We also used a code dataset Mal Net-Tiny (Freitas et al., 2020) consisting of function call graphs with up to 5,000 nodes. We then evaluated our approach on three graph-level tasks from the Open Graph Benchmark (Hu et al., 2020), namely molhiv, molpcba and ppa.
Dataset Splits	Yes	All the benchmarks follow the standard train/validation/test splits. The test performance at the epoch with the best validation performance is reported and is averaged over multiple runs with different random seeds. All the benchmarking results, including the extra ablations, are based on 10 executed runs, except for Peptides-func and Peptides-struct which are based on the output of four runs.
Hardware Specification	Yes	All experiments were run on a single V100 GPU.
Software Dependencies	No	The paper mentions using an "Adam W optimizer" but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	For these datasets we optimized the hyperparameters over the following ranges: Dropout [0, 0.1, 0.2], Feed Forward Block [True, False], Depth [4, 6, 8, 10], Positional Encoding [none, Lap PE, RWSE], Layers Post Message-Passing [1, 2, 3]... Tables 4 and 5: Best performing hyperparameters for Gated GCN+PE+VNG