reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Graph Adaptive Autoregressive Moving Average Models

Authors: Moshe Eliasof, Alessio Gravina, Andrea Ceni, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schönlieb

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 26 synthetic and real-world datasets demonstrate that GRAMA consistently outperforms backbone models and performs competitively with state-of-the-art methods.
Researcher Affiliation	Academia	1Department of Applied Mathematics, University of Cambridge, Cambridge, United Kingdom 2Department of Computer Science, University of Pisa, Pisa, Italy. Correspondence to: Moshe Eliasof <EMAIL>, Alessio Gravina <EMAIL>, Andrea Ceni <EMAIL>.
Pseudocode	No	The paper describes mathematical equations and procedures in text, such as Equation (6) for GRAMA Recurrence, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We release our code at https://github.com/Moshe Eliasof/GRAMA.
Open Datasets	Yes	Experiments on 26 synthetic and real-world datasets demonstrate that GRAMA consistently outperforms backbone models and performs competitively with state-of-the-art methods. Specifically, we show the efficacy in performing long-range propagation, thereby mitigating oversquashing. To this end, we evaluate GRAMA on a graph transfer task (Gravina et al., 2025) in Section 5.1. In a similar spirit, we assess GRAMA on synthetic benchmarks that require the exchange of messages at large distances over the graph, called graph property prediction from Gravina et al. (2023), in Section 5.2. We also verify GRAMA on real-world datasets, including the long-range graph benchmark (Dwivedi et al., 2022b) in Section 5.3, and additional GNN benchmarks in Appendix E.1, where we consider Mal Net-Tiny (Freitas et al., 2021), the heterophilic node classification datasets from Platonov et al. (2023), ZINC12k, OGBG-MOLHIV, Cora, Cite Seer, Pubmed, MNIST CIFAR10, PATTERN, and CLUSTER.
Dataset Splits	Yes	We generated 1000 graphs for training, 100 for validation, and 100 for testing. We used the official splits from Dwivedi et al. (2022b), and reported the average and standard-deviation performance across 3 seeds. We used stratified splitting, following a 70%-10%-20% split, as in Freitas et al. (2021). On the heterophilic datasets, we use the official splits provided in Platonov et al. (2023) and report the average and standard deviation of the obtained performance. For Mal Net-Tiny, we repeat the experiment on 4 different seeds and report the average performance alongside the standard deviation.
Hardware Specification	Yes	Our experiments are run on NVIDIA A6000 and A100 GPUs, with 48GB and 80GB of memory, respectively.
Software Dependencies	No	The paper mentions using "Adam optimizer" and "Adam W optimizer" but does not specify the version numbers for any software libraries (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup	Yes	We perform hyperparameter tuning via grid search, optimizing the Mean Squared Error (MSE) computed on the node features of the whole graph. We train the models using the Adam optimizer for a maximum of 2000 epochs and early stopping with a maximal patience of 100 epochs on the validation loss. For each model configuration, we perform 4 training runs with different weight initialization and report the average of the results. Table 4: The grid of hyperparameters employed during model selection for the graph transfer tasks (Transfer), graph property prediction tasks (Graph Prop), Long Range Graph Benchmark (LRGB), and GNN benchmarks (G-Bench), i.e., Mal Net-Tiny and heterophilic datasets. Hyperparameters Values: Learning rate (0.001, 0.003, 0.001, 0.0005, 0.0001), Weight decay (0, 10e-6, 0.0001), Dropout (0, 0.3, 0.5), Activation function (ReLU, ELU, GELU), Embedding dim (64, 10-30, 64-256), Sequence Length (1-50, 1-20, 2-16), Blocks (1, 2, 4), Graph Backbone (GCN, GPS, Gated GCN).