Supercharging Graph Transformers with Advective Diffusion
Authors: Qitian Wu, Chenxiao Yang, Kaipeng Zeng, Michael M. Bronstein
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, the model demonstrates superiority in various predictive tasks across information networks, molecular screening and protein interactions1. Experiments show that our model offers superior generalization performance across various downstream predictive tasks in diverse domains, including information networks, molecular screening, and protein interactions. |
| Researcher Affiliation | Collaboration | 1Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard 2Toyota Technological Institute at Chicago 3Shanghai Jiao Tong University 4University of Oxford 5Aithyra. Correspondence to: Qitian Wu <EMAIL>. |
| Pseudocode | Yes | Alg. 1 summarizes the feed-forward computation of ADVDIFFORMER-I. Alg. 2 presents the feed-forward computation of ADVDIFFORMER-S that only requires O(N) algorithmic complexity. |
| Open Source Code | Yes | 1Codes are available at https://github.com/ qitianwu/Adv DIFFormer |
| Open Datasets | Yes | Information Networks. We first consider citation networks Arxiv (Hu et al., 2020) and social networks Twitch (Rozemberczki et al., 2021) with graph sizes ranging from 2K to 0.2M, where we use the scalable version ADVDIFFORMER-S. Protein Interactions. We then test on protein-protein interactions (Fu & He, 2022). The Human Annotated Mappings (HAM) dataset (Li et al., 2020) consists of 1,206 molecules with expert annotated mapping operators |
| Dataset Splits | Yes | To introduce topological shifts, we partition the data according to publication years and geographic information for Arxiv and Twitch, respectively. We use the publication years to split the data: papers published before 2014 for training, within the range from 2014 to 2017 for validation, and on 2018/2019/2020 for testing. We therefore split the data according to the geographic information: use the network DE for training, ENGB for validation, and the remaining networks for testing. we consider the dataset-level data splitting and use 6/1/5 graph datasets for training/validation/testing. For data splits, we calculate the relative molecular mass of each molecule using the RDKit package3, and rank the molecules with increasing mass. Then we use the first 70% molecules for training, the following 15% for validation, and the remaining for testing. |
| Hardware Specification | Yes | All the experiments are run on NVIDIA 3090 with 24GB memory. |
| Software Dependencies | Yes | The environment is based on Ubuntu 18.04.6, Cuda 11.6, Pytorch 1.13.0 and Pytorch Geometric 2.1.0. |
| Experiment Setup | Yes | Hyper-Parameters. We use the grid search for hyper-parameter tuning on the validation dataset with the searching space described below. For information networks, hidden size d {32, 64, 128}, learning rate {0.0001, 0.001}, head number H {1, 2, 4}, the weight for local message passing β {0.2, 0.5, 0.8, 1.0}, and the order of propagation (only used for ADVDIFFORMER-S) K {1, 2, 4}. For molecular datasets, hidden size d = 256, learning rate {0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005}, dropout {0.0, 0.1, 0.3, 0.5}, head number H {1, 2, 4}, the weight for local message passing β {0.5, 0.75, 1.0}, the coefficient for identity matrix (only used for ADVDIFFORMER-I) θ {0.5, 1.0}, and the order of propagation (only used for ADVDIFFORMER-S) K {1, 2, 3, 4}. For protein interaction networks, hidden size d {32, 64}, learning rate {0.01, 0.001, 0.0001}, head number H {1, 2, 4}, the weight for local message passing β {0.3, 0.5, 0.8, 1.0}, the coefficient for identity matrix (only used for ADVDIFFORMER-I) θ {0.5, 1.0}, and the order of propagation (only used for ADVDIFFORMER-S) K {1, 2, 3, 4}. |