reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization Principles for Inference over Text-Attributed Graphs with Large Language Models

Authors: Haoyu Peter Wang, Shikun Liu, Rongzhe Wei, Pan Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on 11 real-world TAG benchmarks demonstrate that LLM-BP significantly outperforms existing approaches, achieving 8.10% improvement with task-conditional embeddings and an additional 1.71% gain from adaptive aggregation. The code2 and task-adaptive embeddings3 are publicly available.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA. Correspondence to: Haoyu Wang <EMAIL>, Pan Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 LLM-BP input TAG (G, X) output Class label prediction {ˆyi}i [n] 1: h X Task-adaptive encoding following Eq. (1) 2: if zero-shot then 3: Sample l n nodes, infer labels with LLMs, 4: Nodes clusters based on LLM prediction, 5: q C Average embedding of samples near center, 6: else if few-shot then 7: q C Average embedding of k samples per class, 8: end if 9: Estimate ψij(yi, yj) by employing the LLM to analyze the graph data (e.g., using Eq. (6) based on the estimated homophily level r.) 10: Initialize p(0)(yi) Eq. (5) and m(0) i j(yj) = 1 11: Run LLM-BP (Eq. (4)) for L iterations or its approximation (Eq. (7)) for single iteration 12: ˆyi arg maxyi log p(k) i (yi; xi)
Open Source Code	Yes	The code2 and task-adaptive embeddings3 are publicly available. 2https://github.com/Graph-COM/LLM_BP
Open Datasets	Yes	Evaluations on 11 real-world TAG benchmarks demonstrate that LLM-BP significantly outperforms existing approaches... Table 1: TAG Datasets selected in experiments. Cora (Mc Callum et al., 2000) ... Citeseer (Giles et al., 1998) ... Pubmed (Sen et al., 2008) ... History (Ni et al., 2019) ... Children (Ni et al., 2019) ... Sportsfit (Ni et al., 2019) ... Wikics (Mernyei & Cangea, 2020) ... Cornell (Craven et al., 1998) ... Texas (Craven et al., 1998) ... Wisconsin (Craven et al., 1998) ... Washington (Craven et al., 1998). For the datasets (all the homophily graphs) that have been used for study in TSGFM (Chen et al., 2024d), we follow their implementation to perform data pre-processing, obtain raw texts and do data split, the introduction to data source can be found at Appendix.D.2 in their original paper, the code can be found at the link 1. 1https://github.com/Curry Tang/TSGFM/tree/master?tab=readme-ov-file. As to the heterophily graphs, the four datasets are originally from (Craven et al., 1998). We obtain the raw texts from (Yan et al., 2023), which can be found from2. 2https://github.com/sktsherlock/TAG-Benchmark/tree/master
Dataset Splits	Yes	Additionally, a few-shot setting is considered, where k labeled nodes are known for each class... As to data split, for zero-shot inference, all the nodes are marked as test data; for few-shot setting, k labeled nodes are randomly sampled per class and the rests are marked as test data... We uniformly randomly sample 20c nodes per graph, where c denotes the number of classes... Using 10 different random seeds, we sample the shots from the training set and repeat the experiments 10 times.
Hardware Specification	Yes	All the local experiments run on a server with AMD EPYC 7763 64-Core Processor and eight NVIDIA RTX 6000 Ada GPU cards
Software Dependencies	No	methods are mainly implemented with Py Torch (Paszke et al., 2019), Torch Geometric (Fey & Lenssen, 2019) and Huggingface Transformers (Wolf, 2019).
Experiment Setup	Yes	Hyper-Parameters for BP algorithm For LLM-BP, we adopt 5 message-passing layers, for its linear approximation form, we use a single layer. The temperature hyper-parameter τ in computing node potential initialization in Eq. (5) is set as 0.025 for LLM-BP and 0.01 for LLM-BP (appr.) across all the datasets.