Griffin: Towards a Graph-Centric Relational Database Foundation Model

Authors: Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, Muhan Zhang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on large-scale, heterogeneous, and temporal graphs extracted from RDBs across various domains (spanning over 150 million nodes), Griffin demonstrates superior or comparable performance to individually trained models, excels in low-data scenarios, and shows strong transferability with similarity and diversity in pretraining across new datasets and tasks, highlighting its potential as a universally applicable foundation model for RDBs.
Researcher Affiliation Collaboration 1Institute for Artificial Intelligence, Peking University. 2Amazon Web Services. Correspondence to: Muhan Zhang <EMAIL>.
Pseudocode No The paper describes the model design and training pipeline in detail using textual descriptions, mathematical formulas, and figures. However, there are no explicitly labeled pseudocode or algorithm blocks presenting structured steps.
Open Source Code Yes Code available at github.com/yanxwb/Griffin.
Open Datasets Yes We sourced large-scale temporal RDBs from two leading benchmarks, 4DBInfer (Wang et al., 2024) and Rel Bench (Robinson et al., 2024), covering a wide range of domains, scales, and tasks. A total of 24 tasks were selected for SFT and downstream evaluation. Single-Table Datasets: Over 200 datasets were curated from TPBerta (Yan et al., 2024) and CARTE (Kim et al., 2024) from Hugging Face.
Dataset Splits Yes To ensure robustness, each task was evaluated across five different random seeds for split selection. Limited-Sample SFT: Fine-tuning with a restricted subset of 4096 samples.
Hardware Specification Yes The experiments were conducted on an AWS g6.48x instance, ensuring sufficient computational resources for large-scale graph-based training.
Software Dependencies No The paper mentions using a pre-trained text encoder (Nussbaum et al., 2024) and states that the sentence embedding model was based on Nomic embeddings, but it does not provide specific version numbers for any software libraries or dependencies used in their implementation.
Experiment Setup Yes For optimization and training, we employed the Adam W optimizer with a learning rate of 3e-4 and an L2-norm regularization of 2e-4. A batch size of 256 was used for all training runs. Early stopping was applied with a patience of 10 epochs to prevent overfitting, ensuring stable convergence. No additional learning rate scheduler or gradient clipping was used. The model architecture was designed with a hidden dimension of 512, maintaining consistency between different components. The sentence embedding model was based on Nomic embeddings, truncated to 512 dimensions. The cross-attention module included 8 attention heads and a dropout rate of 0.1, allowing for effective feature extraction while preventing overfitting. Si LU was chosen as the activation function across all layers. For graph construction and sampling, we adopted a 4-layer message-passing neural network (MPNN) with 2-layer uniform sampling on temporal neighbors. The fanout was set to 20 per layer to ensure a balanced trade-off between computational efficiency and capturing structural information. Additionally, reversed edges were incorporated into the sampled subgraph to improve relational modeling.