reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Toward Data-centric Directed Graph Learning: An Entropy-driven Approach

Authors: Xunkai Li, Zhengyu Wu, Kaichi Yu, Hongchao Qin, Guang Zeng, Rong-Hua Li, Guoren Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 14 (di)graph datasets spanning both homophily and heterophily settings and across four downstream tasks show that EDEN achieves SOTA results and significantly enhances existing (Di)GNNs.
Researcher Affiliation	Collaboration	1Department of Computer Science and Technology, Beijing Institute of Technology, Beijing, China 2Alibaba Group, Hang Zhou, Zhejiang, China.
Pseudocode	Yes	To clearly define a greedy partition tree construction algorithm, we introduce the following meta-operations in Alg. 1. These meta-operations collectively define the intricate logic underlying the greedy partition tree construction algorithm, providing a comprehensive framework for constructing hierarchical structures in graph data while adhering to the principles of minimizing directed edge structural entropy. Building upon these foundations, we employ meta-operations to present the detailed workflow of the greedy structural tree construction algorithm. This facilitates the coarse-grained HKT construction from a topological perspective, ultimately achieving digraph data knowledge discovery (i.e., Step 1 Knowledge Discovery (a) in our proposed EDEN as illustrated in Fig. 2). The Alg. 2 outlines the construction of a height-limited partition tree algorithm, emphasizing the minimization of directed structural uncertainty.
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	We evaluate the performance of our proposed EDEN on 10 digraph and 4 undirected graph benchmark datasets... The 10 publicly partitioned digraph datasets include 3 citation networks (Cora ML, Citeseer, and ogbn-arxiv) in (Bojchevski & G unnemann, 2018; Hu et al., 2020), 2 social networks (Slashdot and Epinions) in (Ordozgoiti et al., 2020; Massa & Avesani, 2005), web-link network (Wiki CS) in (Mernyei & Cangea, 2020), crowd-sourcing network (Toloklers) (Platonov et al., 2023), syntax network (Empire), rating network (Rating) (Platonov et al., 2023), and co-editor network (Leskovec et al., 2010).
Dataset Splits	Yes	Table 7. The statistical information of the experimental di(graph) benchmark datasets. Topology-Profile Datasets #Node #Features #Edges #N Classes #N Train/Val/Test #L Train/Val/Test #Task Description Homophily Cora ML 2,995 2,879 8,416 7 140/500/2355 80%/15%/5% Node&Link Citation
Hardware Specification	Yes	The experiments are conducted on Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz, NVIDIA Ge Force RTX 3090 with 24GB memory, and CUDA 11.8.
Software Dependencies	Yes	As for software versions we use Python 3.9 and Pytorch 1.11.0.
Experiment Setup	Yes	The hyperparameters in the baseline models are set according to the original paper if available. Otherwise, we perform a hyperparameter search via the Optuna (Akiba et al., 2019). For our proposed EDEN, during the topology-aware coarse-grained HKT construction, we perform a grid search in the interval [3, 10] to determine the height of HKT. In the profile-oriented fine-grained HKT correction, a grid search is conducted in the interval [1, 2] to obtain the optimal κ, deciding the knowledge reception field when generating parent node representations for the current partition. For random walk-based leaf prediction, we search in the interval [0, 1] based on node-level or link-level downstream tasks to determine the optimal walking probability. Additionally, within the same interval, we search to determine the hyperparameter α for knowledge distillation loss, ensuring optimal convergence.