reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Simple Path Structural Encoding for Graph Transformers

Authors: Louis Airale, Antonio Longa, Mattia Rigon, Andrea Passerini, Roberto Passerone

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate SPSE on extensive benchmarks, including molecular datasets from Benchmarking GNNs (Dwivedi et al., 2023), Long-Range Graph Benchmarks (Dwivedi et al., 2022), and Large-Scale Graph Regression Benchmarks (Hu et al., 2021). SPSE consistently outperforms RRWP in graph-level and node-level tasks, demonstrating significant improvements in molecular and long-range datasets.
Researcher Affiliation	Academia	1University of Trento, Trento, Italy. Correspondence to: Louis Airale <EMAIL>, Roberto Passerone <EMAIL>.
Pseudocode	Yes	Algorithm 1 Count paths between all pairs of nodes (simplified) Algorithm 2 DAGDECOMPOSE: Decomposition of an input graph into multiple DAGs
Open Source Code	Yes	1The Python implementation of the algorithm is available on the project s Github page.
Open Datasets	Yes	We conduct experiments on graph datasets from three distinct benchmarks, covering both nodeand graphlevel tasks. These include ZINC, CLUSTER, PATTERN, MNIST, and CIFAR10 from Benchmarking GNNs (Dwivedi et al., 2023), Peptides-functional and Peptides-structural from the Long-Range Graph Benchmark (Dwivedi et al., 2022), and the 3.7M-sample PCQM4Mv2 dataset from the Large-scale Graph Regression Benchmark (Hu et al., 2021).
Dataset Splits	Yes	To validate Proposition 3, we design a synthetic dataset consisting of 12,000 graphs. ... The dataset is split into training (10,000 graphs), validation (1,000 graphs), and test (1,000 graphs) sets.
Hardware Specification	No	The paper mentions 'gigaflops' in Section 5.1 when discussing model complexities but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'The Python implementation of the algorithm' but does not specify a version for Python or any other libraries/frameworks used, such as PyTorch or TensorFlow, along with their version numbers.
Experiment Setup	Yes	We train these models using three hyperparameter configurations adopted from (Menegaux et al., 2023). These correspond to the setups used for ZINC (config #1), PATTERN (config #2), and CIFAR10 (config #3), covering a range of model complexities from 40 to 280 gigaflops. Table 3. Model configurations used for the synthetic experiments. Configuration #1: Transformer layers 3, Self-attention heads 4, Hidden dimension 52, Learning rate 10^-3, Epochs 100 Configuration #2: Transformer layers 6, Self-attention heads 4, Hidden dimension 64, Learning rate 5 x 10^-4, Epochs 300 Configuration #3: Transformer layers 10, Self-attention heads 8, Hidden dimension 64, Learning rate 5 x 10^-4, Epochs 400