Graph Neural Network Generalization With Gaussian Mixture Model Based Augmentation
Authors: Yassine Abbahaddou, Fragkiskos D. Malliaros, Johannes F. Lutzeyer, Amine M. Aboussalah, Michalis Vazirgiannis
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach not only outperforms existing augmentation techniques in terms of generalization but also offers improved time complexity, making it highly suitable for real-world applications. Our code is publicly available at: https://github.com/abbahaddou/GRATIN. ... Empirical Validation. Through experiments on real-world datasets we confirm GRATIN to be a fast, high-performing graph augmentation scheme in practice. ... In this section, we present our results and analysis. Our experimental setup is described in Appendix J. |
| Researcher Affiliation | Academia | 1LIX, Ecole Polytechnique, IP Paris 2Universit e Paris-Saclay, Centrale Sup elec, Inria 3NYU Tandon School of Engineering 4MBZUAI. Correspondence to: Yassine Abbahaddou <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Graph classification with GRATIN Inputs: GNN of T layers f( , θ) = Ψ READOUT g, where g is the composition of message passing layers, i.e, g = T t=0{AGGREGATE(t) COMBINE(t)( )} and Ψ is the postreadout function, graph classification dataset D, loss function L; Steps: 1. Train GNN f on the training set Dtrain; 2. Use the trained message passing layers and the readout function to generate graph representations H = {h Gn s.t. Gn Dtrain} for the training set; 3. Partition the training set Dtrain by classes, such that Dtrain = S c Dc where Dc = {Gn Dtrain , yn = c}; foreach c {0, . . . , C} do 3.1. Fit a GMM distribution pc on the graph representations Hc = {h Gn s.t. Gn Dc}; 3.2. Sample new graph representation f Hc = {eh s.t. eh pc} from the distribution pc; 3.3. Include the sampled representations f Hc with trained representations Hc = Hc f Hc; end foreach 4. Finetune the post-readout function Ψ on the graph classification task directly on the new training set H = c Hc; |
| Open Source Code | Yes | Our code is publicly available at: https://github.com/abbahaddou/GRATIN. ... In this section, we detail the experimental setup for the conducted experiments. The necessary code to reproduce all our experiments is available on github at: https://github.com/abbahaddou/GRATIN |
| Open Datasets | Yes | We evaluate our model on five widely used datasets from the GNN literature, specifically IMDB-BIN, IMDBMUL, PROTEINS, MUTAG, and DD, all sourced from the TUD Benchmark (Morris et al., 2020). These datasets consist of either molecular or social graphs. Detailed statistics for each dataset are provided in Table 10. |
| Dataset Splits | Yes | We split the dataset into train/test/validation set by 80%/10%/10% and use 10-fold cross-validation for evaluation following the recent work of Zeng et al. (2024). |
| Hardware Specification | Yes | The experiments were conducted on an RTX A6000 GPU. |
| Software Dependencies | No | We used the Py Torch Geometric (Py G) open-source library, licensed under MIT (Fey & Lenssen, 2019). ... using the Adam optimizer (Kingma & Ba, 2014). |
| Experiment Setup | Yes | The GNN was trained on graph classification tasks for 300 epochs with a learning rate of 10 2 using the Adam optimizer (Kingma & Ba, 2014). To model the graph representations of each class, we fit a GMM using the EM algorithm, running for 100 iterations or until the average lower bound gain dropped below 10 3. The number of Gaussians used in the GMM is provided in Table 11. After generating new graph representations from each GMM, we fine-tuned the post-readout function for 100 epochs, maintaining the same learning rate of 10 2. ... We utilized two GNN architectures, GIN and GCN, both consisting of two layers with a hidden dimension of 32. |