reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Graph Neural Network Generalization With Gaussian Mixture Model Based Augmentation

Authors: Yassine Abbahaddou, Fragkiskos D. Malliaros, Johannes F. Lutzeyer, Amine M. Aboussalah, Michalis Vazirgiannis

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach not only outperforms existing augmentation techniques in terms of generalization but also offers improved time complexity, making it highly suitable for real-world applications. Our code is publicly available at: https://github.com/abbahaddou/GRATIN. ... Empirical Validation. Through experiments on real-world datasets we confirm GRATIN to be a fast, high-performing graph augmentation scheme in practice. ... In this section, we present our results and analysis. Our experimental setup is described in Appendix J.
Researcher Affiliation	Academia	1LIX, Ecole Polytechnique, IP Paris 2Universit e Paris-Saclay, Centrale Sup elec, Inria 3NYU Tandon School of Engineering 4MBZUAI. Correspondence to: Yassine Abbahaddou <EMAIL>.
Pseudocode	Yes	Algorithm 1 Graph classification with GRATIN Inputs: GNN of T layers f( , θ) = Ψ READOUT g, where g is the composition of message passing layers, i.e, g = T t=0{AGGREGATE(t) COMBINE(t)( )} and Ψ is the postreadout function, graph classification dataset D, loss function L; Steps: 1. Train GNN f on the training set Dtrain; 2. Use the trained message passing layers and the readout function to generate graph representations H = {h Gn s.t. Gn Dtrain} for the training set; 3. Partition the training set Dtrain by classes, such that Dtrain = S c Dc where Dc = {Gn Dtrain , yn = c}; foreach c {0, . . . , C} do 3.1. Fit a GMM distribution pc on the graph representations Hc = {h Gn s.t. Gn Dc}; 3.2. Sample new graph representation f Hc = {eh s.t. eh pc} from the distribution pc; 3.3. Include the sampled representations f Hc with trained representations Hc = Hc f Hc; end foreach 4. Finetune the post-readout function Ψ on the graph classification task directly on the new training set H = c Hc;
Open Source Code	Yes	Our code is publicly available at: https://github.com/abbahaddou/GRATIN. ... In this section, we detail the experimental setup for the conducted experiments. The necessary code to reproduce all our experiments is available on github at: https://github.com/abbahaddou/GRATIN
Open Datasets	Yes	We evaluate our model on five widely used datasets from the GNN literature, specifically IMDB-BIN, IMDBMUL, PROTEINS, MUTAG, and DD, all sourced from the TUD Benchmark (Morris et al., 2020). These datasets consist of either molecular or social graphs. Detailed statistics for each dataset are provided in Table 10.
Dataset Splits	Yes	We split the dataset into train/test/validation set by 80%/10%/10% and use 10-fold cross-validation for evaluation following the recent work of Zeng et al. (2024).
Hardware Specification	Yes	The experiments were conducted on an RTX A6000 GPU.
Software Dependencies	No	We used the Py Torch Geometric (Py G) open-source library, licensed under MIT (Fey & Lenssen, 2019). ... using the Adam optimizer (Kingma & Ba, 2014).
Experiment Setup	Yes	The GNN was trained on graph classification tasks for 300 epochs with a learning rate of 10 2 using the Adam optimizer (Kingma & Ba, 2014). To model the graph representations of each class, we fit a GMM using the EM algorithm, running for 100 iterations or until the average lower bound gain dropped below 10 3. The number of Gaussians used in the GMM is provided in Table 11. After generating new graph representations from each GMM, we fine-tuned the post-readout function for 100 epochs, maintaining the same learning rate of 10 2. ... We utilized two GNN architectures, GIN and GCN, both consisting of two layers with a hidden dimension of 32.