reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Self-Explainable Heterogeneous GNN for Relational Deep Learning

Authors: Francesco Ferrini, Antonio Longa, Andrea Passerini, Manfred Jaeger

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that in the context of relational databases, our approach effectively identifies informative meta-paths that faithfully capture the model s reasoning mechanisms. It significantly outperforms existing methods in both synthetic and real-world scenarios. Our experimental evaluation seeks to address the following research questions: Q1 Can MPS-GNN recover the correct meta-path when increasing the setting complexity? Q2 Does MPS-GNN outperform existing approaches in tasks over real world relational databases? Q3 Is MPS-GNN self-explainable? We compared MPS-GNN with approaches that don t require predefined meta-paths, handle numerous relations, and incorporate node features in learning. The identified competitors include: MLP, to test the sufficiency of target node features alone; GCN (Kipf & Welling, 2016), a baseline non-relational model; RGCN (Schlichtkrull et al., 2017), extending GCN for multi-relational graphs, with distinct parameters for each edge type; HGN (Lv et al., 2021a), a heterogeneous GNN model extending GAT for multiple relations; GTN (Yun et al., 2019a), which transforms input graphs into different meta-path graphs where node representations are learned; Fast-GTN (Yun et al., 2022b), an optimized GTN variant; R-HGNN (Yu et al., 2021), a relation-aware GNN using cross-relation message passing; and MP-GNN (Ferrini et al., 2024), the original meta-path GNN supporting only existentially quantified meta-paths. We implemented our model using Py Torch Geometric, and used the competitors code from their respective papers for comparison. For training MPS-GNN, we used a 70/20/10 split for training, validation, and testing, respectively, and reported the test results for the model selected based on its validation performance. We employed F1 as evaluation metric to account for the unbalancing in many of the datasets. The paper includes Table 1 and Table 2 showing F1 metric results for synthetic and real-world datasets, respectively.
Researcher Affiliation	Academia	Francesco Ferrini EMAIL University of Trento, Italy Antonio Longa EMAIL University of Trento, Italy Andrea Passerini EMAIL University of Trento, Italy Manfred Jaeger EMAIL Aalborg University, Denmark
Pseudocode	Yes	Algorithm 1 outlines the whole MPS-GNN procedure for the single meta-path case (in practice, a K beam search is used and multiple meta-paths are learned). Algorithm 1 MPS-GNN Learning procedure learn MPS-GNN(G, R, Y, LMAX, η)
Open Source Code	Yes	The code is freely available at 2. 2https://github.com/francescoferrini/MPS-GNN
Open Datasets	Yes	Our approach is particularly useful for predictive tasks in relational databases with multiple tables, where features for a target entity may involve statistics from related tables. To address the second research question, we thus focused on three relational databases with many tables: EICU, a medical database with 31 tables, where we predict patient stay duration in the e ICU, modeled as binary node classification by thresholding duration at 20 hours to achieve two balanced classes.; MONDIAL, a geographic database where the task is predicting whether a country s religion is Christian; and Ergast F1, containing Formula 1 data, where the task is predicting the winner of a race in a binary classification task where target nodes are represented by a combination of race and pilot. EICU Medical database with 31 tables (node types)4 from Johnson et al. (2021). URL https://eicu-crd.mit.edu MONDIAL Database 5 containing data from multiple geographical web data sources (May, 1999). URL http://dbis.informatik.uni-goettingen.de/Mondial. Ergast F1 Database 6 containing Formula 1 races from the 1950 season to the present day. URL https://relational-data.org/dataset/Ergast F1 Recently, a novel benchmark, rel-bench Robinson et al. (2024), has been introduced.
Dataset Splits	Yes	For training MPS-GNN, we used a 70/20/10 split for training, validation, and testing, respectively, and reported the test results for the model selected based on its validation performance.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions execution times in Table 10 without specifying the hardware on which these times were measured.
Software Dependencies	No	We implemented our model using Py Torch Geometric, and used the competitors code from their respective papers for comparison. The optimizer is omitted from the table as it is Adam for all models. lr denotes the learning rate, wd represents the weight decay, and Patience indicates the early stopping patience (if applicable). The paper mentions software like "Py Torch Geometric" and "Adam" but does not specify any version numbers for these or any other key software components.
Experiment Setup	Yes	Table 8: Hyperparameters of competitors and MPS-GNN for the real-world datasets. The optimizer is omitted from the table as it is Adam for all models. lr denotes the learning rate, wd represents the weight decay, and Patience indicates the early stopping patience (if applicable). # layers, Embedding dim., lr, wd, # epochs, Patience, Loss are listed for each model including MPS-GNN.