Toward Data-centric Directed Graph Learning: An Entropy-driven Approach
Authors: Xunkai Li, Zhengyu Wu, Kaichi Yu, Hongchao Qin, Guang Zeng, Rong-Hua Li, Guoren Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 14 (di)graph datasets spanning both homophily and heterophily settings and across four downstream tasks show that EDEN achieves SOTA results and significantly enhances existing (Di)GNNs. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Technology, Beijing Institute of Technology, Beijing, China 2Alibaba Group, Hang Zhou, Zhejiang, China. |
| Pseudocode | Yes | To clearly define a greedy partition tree construction algorithm, we introduce the following meta-operations in Alg. 1. These meta-operations collectively define the intricate logic underlying the greedy partition tree construction algorithm, providing a comprehensive framework for constructing hierarchical structures in graph data while adhering to the principles of minimizing directed edge structural entropy. Building upon these foundations, we employ meta-operations to present the detailed workflow of the greedy structural tree construction algorithm. This facilitates the coarse-grained HKT construction from a topological perspective, ultimately achieving digraph data knowledge discovery (i.e., Step 1 Knowledge Discovery (a) in our proposed EDEN as illustrated in Fig. 2). The Alg. 2 outlines the construction of a height-limited partition tree algorithm, emphasizing the minimization of directed structural uncertainty. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We evaluate the performance of our proposed EDEN on 10 digraph and 4 undirected graph benchmark datasets... The 10 publicly partitioned digraph datasets include 3 citation networks (Cora ML, Citeseer, and ogbn-arxiv) in (Bojchevski & G unnemann, 2018; Hu et al., 2020), 2 social networks (Slashdot and Epinions) in (Ordozgoiti et al., 2020; Massa & Avesani, 2005), web-link network (Wiki CS) in (Mernyei & Cangea, 2020), crowd-sourcing network (Toloklers) (Platonov et al., 2023), syntax network (Empire), rating network (Rating) (Platonov et al., 2023), and co-editor network (Leskovec et al., 2010). |
| Dataset Splits | Yes | Table 7. The statistical information of the experimental di(graph) benchmark datasets. Topology-Profile Datasets #Node #Features #Edges #N Classes #N Train/Val/Test #L Train/Val/Test #Task Description Homophily Cora ML 2,995 2,879 8,416 7 140/500/2355 80%/15%/5% Node&Link Citation |
| Hardware Specification | Yes | The experiments are conducted on Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz, NVIDIA Ge Force RTX 3090 with 24GB memory, and CUDA 11.8. |
| Software Dependencies | Yes | As for software versions we use Python 3.9 and Pytorch 1.11.0. |
| Experiment Setup | Yes | The hyperparameters in the baseline models are set according to the original paper if available. Otherwise, we perform a hyperparameter search via the Optuna (Akiba et al., 2019). For our proposed EDEN, during the topology-aware coarse-grained HKT construction, we perform a grid search in the interval [3, 10] to determine the height of HKT. In the profile-oriented fine-grained HKT correction, a grid search is conducted in the interval [1, 2] to obtain the optimal κ, deciding the knowledge reception field when generating parent node representations for the current partition. For random walk-based leaf prediction, we search in the interval [0, 1] based on node-level or link-level downstream tasks to determine the optimal walking probability. Additionally, within the same interval, we search to determine the hyperparameter α for knowledge distillation loss, ensuring optimal convergence. |