Heterophily-informed Message Passing
Authors: Haishan Wang, Arno Solin, Vikas K Garg
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments, conducted across various data sets and GNN architectures, demonstrate performance enhancements and reveal heterophily patterns across standard classification benchmarks. Furthermore, application to molecular generation showcases notable performance improvements on chemoinformatics benchmarks. |
| Researcher Affiliation | Collaboration | Haishan Wang EMAIL Aalto University Arno Solin EMAIL Aalto University Vikas Garg EMAIL Yai Yai Ltd and Aalto University |
| Pseudocode | No | The paper describes processes like message passing, training, and generation in text. While it uses mathematical equations and explains steps, it does not include a distinct, labeled pseudocode or algorithm block. |
| Open Source Code | Yes | A reference implementation of the methods is available at https://github.com/AaltoML/heterophily-imp. |
| Open Datasets | Yes | We evaluated on 5 homophilic data sets in citation networks (Yang et al., 2016) (Cora, Pub Med, Cite Seer) and co-purchase graphs (Shchur et al., 2018) (Computers, Photo). Furthermore, the 10 heterophilic data sets including hyperlink networks (Pei et al., 2019) (Cornell, Wisconsin, Texas), Wikipedia networks (Rozemberczki et al., 2021) (Chameleon, Squirrel), and heterophilous graph dataset (Platonov et al., 2023) (Roman-empire, Amazon-ratings, Minesweeper, Tolokers, Questions). We consider two common molecule data sets: qm9 and zinc-250k. The qm9 data set (Ramakrishnan et al., 2014) comprises 134k stable small organic molecules... The zinc-250k (Irwin et al., 2012) data contains 250k drug-like molecules... |
| Dataset Splits | Yes | The data split settings (training/validation/test 60%{20%{20%). Each configuration (data set and model) is tested for 10 random model initializations and data splits. For three heterophilic data sets (Cornell, Wisconsin and Texas), the data is split to train/validate/test with fixed 10 seeds from GEOM-GCN (Pei et al., 2019). In this experiment, all data sets are split with ratio train/test 80{20%. |
| Hardware Specification | Yes | All models in this experiment are trained on a Linux cluster equipped with NVIDIA V100 GPUs. The training time and memory requirement for single were (for all modes orig., hom., het., mix. and for all architectures). All models in this experiment are trained on a cluster equipped with NVIDIA A100 GPUs. |
| Software Dependencies | No | The models were implemented in Py Torch (Paszke et al., 2019) and Py Torch Geometric (Py G) (Fey & Lenssen, 2019). The text mentions software names like Py Torch and Py Torch Geometric and references papers, but does not provide specific version numbers for these software components used in the implementation. |
| Experiment Setup | Yes | Each one and its variants (Het MP, Hom MP) contain 2 layers and 128 dimensions for all hidden layers. All the models are trained with the Adam W optimizer (Loshchilov & Hutter, 2019), learning rate 0.001 and drop-out ratio 0.2. The Het Flows in Sec. 4.2 is built on GNNs with 4 layers and flows that were ka 27, kb 10 (for qm9) deep and ka 38, kb 10 (for zinc-250k). |