Heterophily-informed Message Passing

Authors: Haishan Wang, Arno Solin, Vikas K Garg

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments, conducted across various data sets and GNN architectures, demonstrate performance enhancements and reveal heterophily patterns across standard classification benchmarks. Furthermore, application to molecular generation showcases notable performance improvements on chemoinformatics benchmarks.
Researcher Affiliation Collaboration Haishan Wang EMAIL Aalto University Arno Solin EMAIL Aalto University Vikas Garg EMAIL Yai Yai Ltd and Aalto University
Pseudocode No The paper describes processes like message passing, training, and generation in text. While it uses mathematical equations and explains steps, it does not include a distinct, labeled pseudocode or algorithm block.
Open Source Code Yes A reference implementation of the methods is available at https://github.com/AaltoML/heterophily-imp.
Open Datasets Yes We evaluated on 5 homophilic data sets in citation networks (Yang et al., 2016) (Cora, Pub Med, Cite Seer) and co-purchase graphs (Shchur et al., 2018) (Computers, Photo). Furthermore, the 10 heterophilic data sets including hyperlink networks (Pei et al., 2019) (Cornell, Wisconsin, Texas), Wikipedia networks (Rozemberczki et al., 2021) (Chameleon, Squirrel), and heterophilous graph dataset (Platonov et al., 2023) (Roman-empire, Amazon-ratings, Minesweeper, Tolokers, Questions). We consider two common molecule data sets: qm9 and zinc-250k. The qm9 data set (Ramakrishnan et al., 2014) comprises 134k stable small organic molecules... The zinc-250k (Irwin et al., 2012) data contains 250k drug-like molecules...
Dataset Splits Yes The data split settings (training/validation/test 60%{20%{20%). Each configuration (data set and model) is tested for 10 random model initializations and data splits. For three heterophilic data sets (Cornell, Wisconsin and Texas), the data is split to train/validate/test with fixed 10 seeds from GEOM-GCN (Pei et al., 2019). In this experiment, all data sets are split with ratio train/test 80{20%.
Hardware Specification Yes All models in this experiment are trained on a Linux cluster equipped with NVIDIA V100 GPUs. The training time and memory requirement for single were (for all modes orig., hom., het., mix. and for all architectures). All models in this experiment are trained on a cluster equipped with NVIDIA A100 GPUs.
Software Dependencies No The models were implemented in Py Torch (Paszke et al., 2019) and Py Torch Geometric (Py G) (Fey & Lenssen, 2019). The text mentions software names like Py Torch and Py Torch Geometric and references papers, but does not provide specific version numbers for these software components used in the implementation.
Experiment Setup Yes Each one and its variants (Het MP, Hom MP) contain 2 layers and 128 dimensions for all hidden layers. All the models are trained with the Adam W optimizer (Loshchilov & Hutter, 2019), learning rate 0.001 and drop-out ratio 0.2. The Het Flows in Sec. 4.2 is built on GNNs with 4 layers and flows that were ka 27, kb 10 (for qm9) deep and ka 38, kb 10 (for zinc-250k).