TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation

Authors: Ziang Zhou, Zhihao Ding, Jieming Shi, Li Qing, Shiqi Shen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that TINED outperforms GNNs and leading distillation methods across various settings and seven datasets. Source code are available at https://github.com/scottjiao/TINED_ICML25/. Our contributions are as follows: ... Extensive experiments demonstrate that TINED achieves superior performance with various GNN teachers on widely-adopted benchmark datasets.
Researcher Affiliation Collaboration 1Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China 2Wechat, Tencent, Beijing, China. Correspondence to: Jieming Shi <EMAIL>.
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Source code are available at https://github.com/scottjiao/TINED_ICML25/.
Open Datasets Yes Datasets. We use 7 widely used public benchmark datasets, including Cora, Citeseer, Pubmed, A-computer, A-photo (Zhang et al., 2022b; Yang et al., 2021), and Arxiv and Products (Hu et al., 2020) that are two large OGB datasets, to evaluate our method and baselines.
Dataset Splits Yes Table 10 in Appendix A.1 provides the data statistics and splits. For all datasets, we follow the setting in (Zhang et al., 2022b; Yang et al., 2021) to split the data. Specifically, for the first five datasets, we use the splitting in (Yang et al., 2021) and each random seed corresponds to a different split. For the OGB datasets Arxiv and Products, we follow the OGB official splits based on time and popularity respectively.
Hardware Specification Yes The experiments on both baselines and our approach are implemented using Py Torch, the DGL (Wang et al., 2019) library for GNN algorithms, and Adam (Kingma and Ba, 2015) for optimization. We run all experiments on Intel(R) Xeon(R) Platinum 8338C CPU @ 2.60GHz CPU, and a Nvidia Geforce 3090 Cards with Cuda version 11.7.
Software Dependencies No The paper mentions "Py Torch", "DGL" library, "Adam" for optimization, and "Cuda version 11.7". However, it does not provide version numbers for PyTorch, DGL, or Adam, which are key software components. Only CUDA has a version specified, which is not sufficient for a reproducible software stack.
Experiment Setup Yes Appendix A.7 details the hyperparameter search space: Learning rate from [0.0001, 0.0005, 0.001, 0.005, 0.01], weight decay from [0.0, 0.0001, 0.0005, 0.001, 0.005, 0.01], weight of distillation λ from [0.1, 0.4, 0.5, 0.6, 1], nornamlization type from [batch normalization, layer normalization, none], dropout from [0, 0.1, 0.3, 0.5, 0.8]. Batch size for two large OGB datasets from [512, 1024, 4096]. Weight of DED β from [1e 6, 5e 5, 1e 5, 0.05, 0.1, 0.5, 1, 5, 10]. Fine tuning weight η for injected teacher FT layers from [0.01, 0.1, 0.5, 1, 3, 10].