reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Authors: Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our GOFA model is evaluated on various downstream datasets unseen during the pre-training and fine-tuning phases, demonstrating a strong ability to solve structural and contextual problems in zero-shot scenarios. The code is available at https://github.com/Jiarui Feng/GOFA.
Researcher Affiliation	Academia	1Washington University in St. Louis 2Peking University EMAIL EMAIL, EMAIL
Pseudocode	No	The paper describes mathematical formulations and architectural designs (e.g., Equations 1, 2, 3, and Figure 3) and implementation details, but does not include a distinct section or figure explicitly labeled "Pseudocode" or "Algorithm".
Open Source Code	Yes	The code is available at https://github.com/Jiarui Feng/GOFA.
Open Datasets	Yes	The training datasets include MAG240M (Hu et al., 2021a), Pubmed and Arxiv (Hu et al., 2021b) for academic knowledge, Wikikg90mv2 (Hu et al., 2021a) and Wiki Graph (proposed by us) for semantic diversity, and Ultrachat200k (Ding et al., 2023) dataset for question-answering ability. The pre-trained model is further instruction fine-tuned to obtain the task-solving ability. Our GOFA model is evaluated on various downstream datasets unseen during the pre-training and fine-tuning phases.
Dataset Splits	Yes	For the node-level task [Cora], the aim is to classify the node into the correct paper category from 7 different categories. The split is obtained from OFA. It contains 140/500/2068 samples for train/val/test set respectively. For the link-level task, the object is to predict whether two paper nodes are co-cited or not. We follow the setting of OFA (Liu et al., 2023a) and randomly split all edges into train/val/test sets with a ratio of 0.85/0.05/0.1.
Hardware Specification	Yes	The training is conducted on 8 NVIDIAA100_SXM4_80GB GPUs with Deep Speed stage 2 (Rajbhandari et al., 2020) parallelism.
Software Dependencies	No	Both the GOFA and all baselines are implemented using Python with Pytorch, transformers, and Py G (Fey & Lenssen, 2019) packages. The training is conducted on 8 NVIDIAA100_SXM4_80GB GPUs with Deep Speed stage 2 (Rajbhandari et al., 2020) parallelism. Specific version numbers for these software dependencies are not provided.
Experiment Setup	Yes	The detailed training parameters are set the same for both two models and are listed in Table 12. We use Adam W optimizer with β = (0.9, 0.95). We use a cosine annealing learning rate scheduler, and the minimum learning rate is 10% of the initial learning rate. Table 12: Hyper-parameters for pretraining. lr: 0.0001, weight_decay: 0.1, batch_size: 8, dropout: 0.0, grad_clip: 0.5, gradient_accum: 8, llm_max_length: 128, optimizer: Adam W.