Large Language Model Meets Graph Neural Network in Knowledge Distillation

Authors: Shengxiang Hu, Guobing Zou, Song Yang, Shiyi Lin, Yanglan Gan, Bofeng Zhang, Yixin Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that Lingu GKD outperforms existing graph distillation frameworks, the distilled simple GNNs achieve comparable or superior performance to more complex GNNs and teacher LLMs, while maintaining computational efficiency. ... Through extensive experiments evaluations across diverse LLM-GNN combinations and multiple benchmark datasets, we demonstrate that Lingu GKD significantly enhances GNN accuracy while maintaining a lightweight model structure. ... Experimental Setup Datasets and Model Selection We evaluated our Lingu GKD framework on three widely-used benchmark datasets for node classification: Cora, Pub Med (Yang, Cohen, and Salakhudinov 2016), and Arxiv (Hu et al. 2020).
Researcher Affiliation Academia 1School of Computer Engineering and Science, Shanghai University, Shanghai, China 2School of Computer Science and Technology, Donghua University, Shanghai, China 3School of Computer and Information Engineering, Shanghai Polytechnic University, Shanghai, China 4Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130 USA EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository.
Open Datasets Yes We evaluated our Lingu GKD framework on three widely-used benchmark datasets for node classification: Cora, Pub Med (Yang, Cohen, and Salakhudinov 2016), and Arxiv (Hu et al. 2020).
Dataset Splits Yes Table 1: Dataset Statistics ... |Dtr| : |Dval| : |Dtest| 6:2:2 6:2:2 5.4:1.8:2.8
Hardware Specification No The paper presents 'Inference Time (s)' in Figure 2, but it does not explicitly specify the hardware (e.g., specific GPU or CPU models) used for running the experiments.
Software Dependencies No The paper mentions models like 'Llama2-7B (Touvron et al. 2023)' and 'Llama3-8B (Dubey et al. 2024)' for teacher LLMs, and 'GCN (Kipf and Welling 2017)', 'GAT (Veliˇckovi c et al. 2018)', 'Graph SAGE (Hamilton, Ying, and Leskovec 2017)', and 'GIN (Xu et al. 2018)' for student GNNs. However, it does not specify software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The student GNN is trained end-to-end using mini-batch Adam W optimization, effectively balancing the transfer of rich semantic knowledge from the teacher LLM with the structural learning capabilities inherent to GNNs for the specific downstream task. ... We conducted an extensive analysis of two critical hyperparameters: neighbor orders (k) and hidden feature dimensions (d G) of GNNs, using the Cora dataset as our benchmark. ... Based on these findings, we identified the 2-hop setting and 1024-dimensional hidden features as optimal, balancing performance and efficiency for Lingu GKD. ... The dataset-specific adaptation patterns evidence the framework s ability to capture graph characteristics: higher first-order neighbor factors (γ1) in Cora indicate dominance of local topology, while elevated structure-free features (γ0) in Pub Med suggest stronger dependence on node attributes. This automatic adaptation, combined with balanced optimization of distillation (β) and task-specific (α) objectives, demonstrates Lingu GKD s advantage in maintaining knowledge fidelity across heterogeneous graphs.