WILTing Trees: Interpreting the Distance Between MPNN Embeddings
Authors: Masahiro Negishi, Thomas Gärtner, Pascal Welke
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate that MPNNs define the relative position of embeddings by focusing on a small set of subgraphs that are known to be functionally important in the domain. Section 6. Experiments: In this section, we confirm that our proposed d WILT can successfully approximate d MPNN. Then, we show that the distribution of learned edge weights of WILT is skewed towards 0, and a large part of them can be removed with L1 regularization. Finally, we investigate the WL colors that influence d MPNN most. Due to space limitations, we report results only for a selection of MPNNs and datasets. Code is available online, and experimental settings and additional results are in Appendix E. |
| Researcher Affiliation | Academia | 1TU Wien, Vienna, Austria 2Lancaster University Leipzig, Leipzig, Germany. Correspondence to: Masahiro Negishi <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Optimizing edge weights of WILT |
| Open Source Code | Yes | Code is available online, and experimental settings and additional results are in Appendix E. The code to run our experiments is available at https://github.com/masahiro-negishi/wilt. |
| Open Datasets | Yes | We conduct experiments on three different datasets: Mutagenicity and ENZYMES (Morris et al., 2020), and Lipophilicity (Wu et al., 2018). We chose these datasets to represent binary classification, multiclass classification, and regression tasks, respectively. Next, we offer additional experimental results on non-molecular datasets: IMDB-BINARY and COLLAB (obtained from Morris et al., 2020). |
| Dataset Splits | Yes | In each setting, we split the dataset into Dtrain, Deval, and Dtest (8:1:1). We train the model for 100 epochs and record the performance on Deval after each epoch. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU types) for running its experiments. It only mentions running models, but without hardware specifications. |
| Software Dependencies | No | The paper mentions using the "Adam optimizer" and training "GCNs with mean or sum pooling", but it does not specify version numbers for these or other software libraries (e.g., PyTorch, TensorFlow, scikit-learn) or programming languages, which are crucial for reproducibility. |
| Experiment Setup | Yes | For each model architecture, we vary the number of message passing layers (1, 2, 3, 4), the embedding dimensions (32, 64, 128), and the graph pooling methods (mean, sum). This results in a total of 2 × 4 × 3 × 2 = 48 different MPNNs for each dataset. In each setting, we split the dataset into Dtrain, Deval, and Dtest (8:1:1). We train the model for 100 epochs and record the performance on Deval after each epoch. We set the batch size to 32, and use the Adam optimizer with learning rate of 10−3. ALIk(d MPNN, dfunc) and the performance metric (accuracy for Mutagenicity and ENZYMES, RMSE for Lipophilicity) are calculated with the model at the epoch that performed best on Deval. |