Aggregation Mechanism Based Graph Heterogeneous Networks Distillation
Authors: Xiaobin Hong, Mingkai Lin, Xiangkai Ma, Wenzhong Li, Sanglu Lu
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on 8 standard and 4 large-scale datasets demonstrate that AMEND consistently outperforms state-of-the-art distillation methods. To fully evaluate the proposed method, we conduct extensive experiments on 8 regular graph datasets and 4 large-scale graph datasets to compare with state-of-the-art methods. |
| Researcher Affiliation | Academia | Xiaobin Hong , Mingkai Lin , Xiangkai Ma , Wenzhong Li , Sanglu Lu State Key Laboratory for Novel Software Technology, Nanjing University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 AMEND Algorithm Input: graph G = {V, E}, node feature matrix X, and precomputed position encoding Xpe Output: optimized parameters of the student MLP S, predict node labels ˆY. 1: Model initialization and Dataset Partitioning. 2: Pretrain the teacher model T with cross-entropy loss. 3: #Student MLP Training 4: for Epochs do 5: #Aggragation Context Preservation 6: ZT = T (X, E, Xpe), 7: ZS = S(X, Xpe); 8: #Aggregation-enhanced CKA 9: LACKA ACKA(ZT , ZS) in Eq. 7; 10: #Shared Manifold mixup 11: Zmix T = λZT + (1 λ)Z T ; 12: Zmix S = λZS + (1 λ)Z S; 13: ˆYT , ˆYS g T (ZT ), g S(ZS); 14: ˆYmix T , ˆYmix S g T (Zmix T ), g S(Zmix S ); 15: #Logit distillation 16: Llogit = Lmix + Lpred in Eq. 11; 17: #Overall loss compute 18: LS = Ltask + βLACKA + γLlogit in Eq. 12; 19: Gradient backward and model optimization. 20: end for 21: return S, ˆY |
| Open Source Code | No | The paper does not explicitly state that source code is provided or give a link to a repository. Phrases like "we release our code" or similar are not present. |
| Open Datasets | Yes | Datasets. To fully evaluate our proposed method, we use 8 public regular graph benchmarks [Yang et al., 2021], i.e. Cora, Citeseer, Pubmed, Computer, Photo, Corafull, Coauthor-CS, Coauthor-Physics, and 4 large-scale graphs [Hu et al., 2020], i.e., Ogbn-Arxiv, Aminer, Reddit, and Ogbn-Products. |
| Dataset Splits | Yes | For each dataset, we follow the dataset protocol in [Chen et al., 2023], where 6/2/2 of the nodes are used as training/validation/test sets, respectively. For the first two datasets, we randomly selected two non-overlapping 10% nodes as the validation and test sets, respectively, and doubled 1% for the last two datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | In Figure 5, we explore the sensitivity of hyper parameters β and γ in overall objective function Eq. 12 on three citation graphs. β and γ represent the contributions of the ACKA and manifold mixup logit distillation, respectively. The results indicate that the optimal performance is achieved with β = 10 and γ = 0.1. According to the definition of LACKA, its value range is [0, 1]. We monitored the values of each component of the loss function during training and found that, with β = 10, γ = 0.1, the scales of LACKA and Llogit were comparable to the task loss component Ltask, leading to optimal model convergence. |