Aggregation Mechanism Based Graph Heterogeneous Networks Distillation

Authors: Xiaobin Hong, Mingkai Lin, Xiangkai Ma, Wenzhong Li, Sanglu Lu

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on 8 standard and 4 large-scale datasets demonstrate that AMEND consistently outperforms state-of-the-art distillation methods. To fully evaluate the proposed method, we conduct extensive experiments on 8 regular graph datasets and 4 large-scale graph datasets to compare with state-of-the-art methods.
Researcher Affiliation Academia Xiaobin Hong , Mingkai Lin , Xiangkai Ma , Wenzhong Li , Sanglu Lu State Key Laboratory for Novel Software Technology, Nanjing University EMAIL, EMAIL
Pseudocode Yes Algorithm 1 AMEND Algorithm Input: graph G = {V, E}, node feature matrix X, and precomputed position encoding Xpe Output: optimized parameters of the student MLP S, predict node labels ˆY. 1: Model initialization and Dataset Partitioning. 2: Pretrain the teacher model T with cross-entropy loss. 3: #Student MLP Training 4: for Epochs do 5: #Aggragation Context Preservation 6: ZT = T (X, E, Xpe), 7: ZS = S(X, Xpe); 8: #Aggregation-enhanced CKA 9: LACKA ACKA(ZT , ZS) in Eq. 7; 10: #Shared Manifold mixup 11: Zmix T = λZT + (1 λ)Z T ; 12: Zmix S = λZS + (1 λ)Z S; 13: ˆYT , ˆYS g T (ZT ), g S(ZS); 14: ˆYmix T , ˆYmix S g T (Zmix T ), g S(Zmix S ); 15: #Logit distillation 16: Llogit = Lmix + Lpred in Eq. 11; 17: #Overall loss compute 18: LS = Ltask + βLACKA + γLlogit in Eq. 12; 19: Gradient backward and model optimization. 20: end for 21: return S, ˆY
Open Source Code No The paper does not explicitly state that source code is provided or give a link to a repository. Phrases like "we release our code" or similar are not present.
Open Datasets Yes Datasets. To fully evaluate our proposed method, we use 8 public regular graph benchmarks [Yang et al., 2021], i.e. Cora, Citeseer, Pubmed, Computer, Photo, Corafull, Coauthor-CS, Coauthor-Physics, and 4 large-scale graphs [Hu et al., 2020], i.e., Ogbn-Arxiv, Aminer, Reddit, and Ogbn-Products.
Dataset Splits Yes For each dataset, we follow the dataset protocol in [Chen et al., 2023], where 6/2/2 of the nodes are used as training/validation/test sets, respectively. For the first two datasets, we randomly selected two non-overlapping 10% nodes as the validation and test sets, respectively, and doubled 1% for the last two datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes In Figure 5, we explore the sensitivity of hyper parameters β and γ in overall objective function Eq. 12 on three citation graphs. β and γ represent the contributions of the ACKA and manifold mixup logit distillation, respectively. The results indicate that the optimal performance is achieved with β = 10 and γ = 0.1. According to the definition of LACKA, its value range is [0, 1]. We monitored the values of each component of the loss function during training and found that, with β = 10, γ = 0.1, the scales of LACKA and Llogit were comparable to the task loss component Ltask, leading to optimal model convergence.