Graph Transformers Dream of Electric Flow
Authors: Xiang Cheng, Lawrence Carin, Suvrit Sra
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show theoretically and empirically that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems such as electric flow and eigenvector decomposition. The Transformer has access to information on the input graph only via the graph s incidence matrix. We present explicit weight configurations for implementing each algorithm, and we bound the constructed Transformers errors by the errors of the underlying algorithms. Our theoretical findings are corroborated by experiments on synthetic data. Additionally, on a real-world molecular regression task, we observe that the linear Transformer is capable of learning a more effective positional encoding than the default one based on Laplacian eigenvectors. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Duke University, USA 2School of Computation, Information and Technology, Technical University of Munich, Germany EMAIL, EMAIL |
| Pseudocode | Yes | ALGORITHM 1 Subspace Iteration Φ0 Rn k has full column rank . while not converged do ˆΦi+1 LΦi Φi+1 QR(ˆΦi+1) QR(Φ) returns the Q matrix in the QR decomposition of Φ. end while |
| Open Source Code | Yes | Code is available at https://github.com/chengxiang/Linear_Graph_Transformer |
| Open Datasets | Yes | In Section 6, we demonstrate the advantage of using the linear Transformer as a replacement for Laplacian eigenvector PE on a molecular regression benchmark using the QM9 and ZINC datasets (Ruddigkeit et al., 2012; Ramakrishnan et al., 2014; Irwin et al., 2012). |
| Dataset Splits | Yes | For QM9, the training set size is 20,000, the validation set size is 2000, the test set size is 100,000. The training/validation set are subsampled from the full training/validation set. For ZINC, the training set size is 20,000, the validation set size is 2000, the test set size is 24,445. The training/validation set are subsampled from the full training/validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It describes the experiments and training parameters but not the underlying hardware. |
| Software Dependencies | No | The paper mentions 'Adam W' as an optimizer and refers to an 'implementation from Dwivedi & Bresson (2020)', but it does not provide specific version numbers for these or any other software components (e.g., Python, PyTorch, CUDA, etc.). |
| Experiment Setup | Yes | We train using Adam W. Graph Transformer parameters are updated with initial 0.001 lr. Linear Transformer parameters are updated with initial 0.01 lr. For ZINC, we train for 2000 epochs with lr halved every 800 epochs. For QM9, we train for 1000 epochs with lr halved ever 400 epochs. The means and standard deviations in Table 2 are each computed over 4 independent seeded runs. The GT we use has 128 hidden dimensions, 8 heads, and 4 layers. The position encoding dimension is 3 for QM9, and 6 for ZINC. The linear Transformer we use contain L = 9 layers, with parameters shared every 3 layers, as described in Section 14.2. The dimension of Φl is 8. Before training on the actual regression task, we pretrain the linear Transformer to return the top PE_dim Laplacian eigenvectors. |