Toward Data-centric Directed Graph Learning: An Entropy-driven Approach

Authors: Xunkai Li, Zhengyu Wu, Kaichi Yu, Hongchao Qin, Guang Zeng, Rong-Hua Li, Guoren Wang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 14 (di)graph datasets spanning both homophily and heterophily settings and across four downstream tasks show that EDEN achieves SOTA results and significantly enhances existing (Di)GNNs.
Researcher Affiliation Collaboration 1Department of Computer Science and Technology, Beijing Institute of Technology, Beijing, China 2Alibaba Group, Hang Zhou, Zhejiang, China.
Pseudocode Yes To clearly define a greedy partition tree construction algorithm, we introduce the following meta-operations in Alg. 1. These meta-operations collectively define the intricate logic underlying the greedy partition tree construction algorithm, providing a comprehensive framework for constructing hierarchical structures in graph data while adhering to the principles of minimizing directed edge structural entropy. Building upon these foundations, we employ meta-operations to present the detailed workflow of the greedy structural tree construction algorithm. This facilitates the coarse-grained HKT construction from a topological perspective, ultimately achieving digraph data knowledge discovery (i.e., Step 1 Knowledge Discovery (a) in our proposed EDEN as illustrated in Fig. 2). The Alg. 2 outlines the construction of a height-limited partition tree algorithm, emphasizing the minimization of directed structural uncertainty.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes We evaluate the performance of our proposed EDEN on 10 digraph and 4 undirected graph benchmark datasets... The 10 publicly partitioned digraph datasets include 3 citation networks (Cora ML, Citeseer, and ogbn-arxiv) in (Bojchevski & G unnemann, 2018; Hu et al., 2020), 2 social networks (Slashdot and Epinions) in (Ordozgoiti et al., 2020; Massa & Avesani, 2005), web-link network (Wiki CS) in (Mernyei & Cangea, 2020), crowd-sourcing network (Toloklers) (Platonov et al., 2023), syntax network (Empire), rating network (Rating) (Platonov et al., 2023), and co-editor network (Leskovec et al., 2010).
Dataset Splits Yes Table 7. The statistical information of the experimental di(graph) benchmark datasets. Topology-Profile Datasets #Node #Features #Edges #N Classes #N Train/Val/Test #L Train/Val/Test #Task Description Homophily Cora ML 2,995 2,879 8,416 7 140/500/2355 80%/15%/5% Node&Link Citation
Hardware Specification Yes The experiments are conducted on Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz, NVIDIA Ge Force RTX 3090 with 24GB memory, and CUDA 11.8.
Software Dependencies Yes As for software versions we use Python 3.9 and Pytorch 1.11.0.
Experiment Setup Yes The hyperparameters in the baseline models are set according to the original paper if available. Otherwise, we perform a hyperparameter search via the Optuna (Akiba et al., 2019). For our proposed EDEN, during the topology-aware coarse-grained HKT construction, we perform a grid search in the interval [3, 10] to determine the height of HKT. In the profile-oriented fine-grained HKT correction, a grid search is conducted in the interval [1, 2] to obtain the optimal κ, deciding the knowledge reception field when generating parent node representations for the current partition. For random walk-based leaf prediction, we search in the interval [0, 1] based on node-level or link-level downstream tasks to determine the optimal walking probability. Additionally, within the same interval, we search to determine the hyperparameter α for knowledge distillation loss, ensuring optimal convergence.