GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Authors: Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our GOFA model is evaluated on various downstream datasets unseen during the pre-training and fine-tuning phases, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios. The code is available at https://github.com/Jiarui Feng/GOFA. |
| Researcher Affiliation | Academia | 1Washington University in St. Louis 2Peking University EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper describes mathematical formulations and architectural designs (e.g., Equations 1, 2, 3, and Figure 3) and implementation details, but does not include a distinct section or figure explicitly labeled "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | The code is available at https://github.com/Jiarui Feng/GOFA. |
| Open Datasets | Yes | The training datasets include MAG240M (Hu et al., 2021a), Pubmed and Arxiv (Hu et al., 2021b) for academic knowledge, Wikikg90mv2 (Hu et al., 2021a) and Wiki Graph (proposed by us) for semantic diversity, and Ultrachat200k (Ding et al., 2023) dataset for question-answering ability. The pre-trained model is further instruction fine-tuned to obtain the task-solving ability. Our GOFA model is evaluated on various downstream datasets unseen during the pre-training and fine-tuning phases. |
| Dataset Splits | Yes | For the node-level task [Cora], the aim is to classify the node into the correct paper category from 7 different categories. The split is obtained from OFA. It contains 140/500/2068 samples for train/val/test set respectively. For the link-level task, the object is to predict whether two paper nodes are co-cited or not. We follow the setting of OFA (Liu et al., 2023a) and randomly split all edges into train/val/test sets with a ratio of 0.85/0.05/0.1. |
| Hardware Specification | Yes | The training is conducted on 8 NVIDIAA100_SXM4_80GB GPUs with Deep Speed stage 2 (Rajbhandari et al., 2020) parallelism. |
| Software Dependencies | No | Both the GOFA and all baselines are implemented using Python with Pytorch, transformers, and Py G (Fey & Lenssen, 2019) packages. The training is conducted on 8 NVIDIAA100_SXM4_80GB GPUs with Deep Speed stage 2 (Rajbhandari et al., 2020) parallelism. Specific version numbers for these software dependencies are not provided. |
| Experiment Setup | Yes | The detailed training parameters are set the same for both two models and are listed in Table 12. We use Adam W optimizer with β = (0.9, 0.95). We use a cosine annealing learning rate scheduler, and the minimum learning rate is 10% of the initial learning rate. Table 12: Hyper-parameters for pretraining. lr: 0.0001, weight_decay: 0.1, batch_size: 8, dropout: 0.0, grad_clip: 0.5, gradient_accum: 8, llm_max_length: 128, optimizer: Adam W. |