reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How Expressive are Knowledge Graph Foundation Models?

Authors: Xingyue Huang, Pablo Barcelo, Michael M. Bronstein, Ismail Ilkan Ceylan, Mikhail Galkin, Juan L Reutter, Miguel Romero Orth

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically validate our theoretical findings, showing that the use of richer motifs results in better performance on a wide range of datasets drawn from different domains.
Researcher Affiliation	Collaboration	1University of Oxford 2Universidad Cat olica de Chile 3IMFD 4CENIA 5AITHYRA 6Google. Correspondence to: Xingyue Huang <University of Oxford>.
Pseudocode	Yes	A.1. Model definitions C-MPNNs. Let G = (V, E, R) be a KG. A conditional message passing neural network (C-MPNN) iteratively computes node representations, relative to a fixed query q R and a fixed source node u V , as follows: h(0) v\|u,q = INIT(u, v, q) h(ℓ+1) v\|u,q = UP h(ℓ) v\|u,q, AGG({{MSGr(h(ℓ) w\|u,q, zq)\| w Nr(v), r R}}) , where INIT, UP, AGG, and MSGr are differentiable initialization, update, aggregation, and relation-specific message functions, respectively.
Open Source Code	Yes	The code is available at https://github.com/HxyScotthuang/MOTIF/.
Open Datasets	Yes	For pretraining and fine-tuning experiments, we follow the protocol of Galkin et al. (2024) and pretrain on FB15k237 (Toutanova & Chen, 2015), WN18RR (Dettmers et al., 2018), and Co DEx Medium (Safavi & Koutra, 2020). We then apply zeroshot inference and fine-tuned inference over 51 KGs across three settings: inductive on nodes and relations (Inductive e, r), inductive on nodes (Inductive e), and Transductive. The detailed information of datasets, model architectures, implementations, and hyper-parameters used in the experiments are presented in Appendix M. Following convention (Zhu et al., 2021), on each knowledge graph and for each triplet r(u, v), we augment the correspond-
Dataset Splits	Yes	The detailed information of datasets, model architectures, implementations, and hyper-parameters used in the experiments are presented in Appendix M. ... Table 13. Dataset statistics for inductive-e, r link prediction datasets. Triples are the number of edges given at training, validation, or test graphs, respectively, whereas Valid and Test denote triples to be predicted in the validation and test graphs.
Hardware Specification	Yes	All experiments were performed on the FB15k-237 dataset using a batch size of 64 on a single NVIDIA H100 GPU.
Software Dependencies	No	The paper mentions PyTorch Geometric, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Table 16. MOTIF hyper-parameters for pretraining, fine-tuning, and training from end to end. Table 17. Hyperparameters for fine-tuning MOTIF and from end to end. ... We use ULTRA with sum aggregation on both relational and entity levels, and similarly in all variants of MOTIF(Fstar m ). The message function on the entity level is chosen to be elemental-wise summation to avoid loss of information from relation types during the first message passing step. We use 2 layers for both ULTRA and all variants of MOTIF(Fstar m ), each with 32 dimension on both relation and entity model, and the Adam optimizer is used with a learning rate of 0.001, trained for 500 epochs in all experiments.