Graph Adaptive Autoregressive Moving Average Models
Authors: Moshe Eliasof, Alessio Gravina, Andrea Ceni, Claudio Gallicchio, Davide Bacciu, Carola-Bibiane Schönlieb
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 26 synthetic and real-world datasets demonstrate that GRAMA consistently outperforms backbone models and performs competitively with state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Department of Applied Mathematics, University of Cambridge, Cambridge, United Kingdom 2Department of Computer Science, University of Pisa, Pisa, Italy. Correspondence to: Moshe Eliasof <EMAIL>, Alessio Gravina <EMAIL>, Andrea Ceni <EMAIL>. |
| Pseudocode | No | The paper describes mathematical equations and procedures in text, such as Equation (6) for GRAMA Recurrence, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code at https://github.com/Moshe Eliasof/GRAMA. |
| Open Datasets | Yes | Experiments on 26 synthetic and real-world datasets demonstrate that GRAMA consistently outperforms backbone models and performs competitively with state-of-the-art methods. Specifically, we show the efficacy in performing long-range propagation, thereby mitigating oversquashing. To this end, we evaluate GRAMA on a graph transfer task (Gravina et al., 2025) in Section 5.1. In a similar spirit, we assess GRAMA on synthetic benchmarks that require the exchange of messages at large distances over the graph, called graph property prediction from Gravina et al. (2023), in Section 5.2. We also verify GRAMA on real-world datasets, including the long-range graph benchmark (Dwivedi et al., 2022b) in Section 5.3, and additional GNN benchmarks in Appendix E.1, where we consider Mal Net-Tiny (Freitas et al., 2021), the heterophilic node classification datasets from Platonov et al. (2023), ZINC12k, OGBG-MOLHIV, Cora, Cite Seer, Pubmed, MNIST CIFAR10, PATTERN, and CLUSTER. |
| Dataset Splits | Yes | We generated 1000 graphs for training, 100 for validation, and 100 for testing. We used the official splits from Dwivedi et al. (2022b), and reported the average and standard-deviation performance across 3 seeds. We used stratified splitting, following a 70%-10%-20% split, as in Freitas et al. (2021). On the heterophilic datasets, we use the official splits provided in Platonov et al. (2023) and report the average and standard deviation of the obtained performance. For Mal Net-Tiny, we repeat the experiment on 4 different seeds and report the average performance alongside the standard deviation. |
| Hardware Specification | Yes | Our experiments are run on NVIDIA A6000 and A100 GPUs, with 48GB and 80GB of memory, respectively. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" and "Adam W optimizer" but does not specify the version numbers for any software libraries (e.g., PyTorch, TensorFlow) or programming languages. |
| Experiment Setup | Yes | We perform hyperparameter tuning via grid search, optimizing the Mean Squared Error (MSE) computed on the node features of the whole graph. We train the models using the Adam optimizer for a maximum of 2000 epochs and early stopping with a maximal patience of 100 epochs on the validation loss. For each model configuration, we perform 4 training runs with different weight initialization and report the average of the results. Table 4: The grid of hyperparameters employed during model selection for the graph transfer tasks (Transfer), graph property prediction tasks (Graph Prop), Long Range Graph Benchmark (LRGB), and GNN benchmarks (G-Bench), i.e., Mal Net-Tiny and heterophilic datasets. Hyperparameters Values: Learning rate (0.001, 0.003, 0.001, 0.0005, 0.0001), Weight decay (0, 10e-6, 0.0001), Dropout (0, 0.3, 0.5), Activation function (ReLU, ELU, GELU), Embedding dim (64, 10-30, 64-256), Sequence Length (1-50, 1-20, 2-16), Blocks (1, 2, 4), Graph Backbone (GCN, GPS, Gated GCN). |