reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Graph Representation Learning with Localized Topological Features

Authors: Zuoyu Yan, Qi Zhao, Ze Ye, Tengfei Ma, Liangcai Gao, Zhi Tang, Yusu Wang, Chao Chen

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that the localized topological features greatly enhance the representation learning, and achieve state-of-the-art results on various node classification and link prediction benchmarks. We also explore the option of end-to-end learning of the topological features, i.e., treating topological computation as a differentiable operator during learning. Our theoretical analysis and empirical study provide insights and potential guidelines for employing topological features in graph learning tasks.
Researcher Affiliation	Academia	Zuoyu Yan1,5 EMAIL Qi Zhao2 EMAIL Ze Ye3 EMAIL Tengfei Ma3, EMAIL Liangcai Gao1, EMAIL Zhi Tang1 EMAIL Yusu Wang4, EMAIL Chao Chen3, EMAIL 1 Wangxuan Institute of Computer Technology, Peking University 2 Computer Science and Engineering Department, University of California, San Diego 3 Department of Biomedical Informatics, Stony Brook University 4 Halıcıo glu Data Science Institute, University of California, San Diego 5 Weill Cornell Medicine, Cornell University
Pseudocode	Yes	Algorithm 1 Computation of 1D EPD corresponding to cycles 1: Input: filter function f, input graph G = (V, E) 2: V, E = sorted(V, E, f), PD1 = {} 3: for i V do 4: Ci = {Cij\|(i, j) E, f(j) > f(i)}, Ei = E 5: for Cij Ci do 6: f(Cij) = f(i), Ei = Ei {(i, j)} + {(Cij, j)} 8: PDi 1 = Union-Find-step(V + Ci {i}, Ei, f, Ci) 9: PD1+ = PDi 1 10: end for 11: Output: PD0, PD1 Algorithm 2 Union-Find-step
Open Source Code	Yes	Source code is available at https://github.com/pkuyzy/TLC-GNN.
Open Datasets	Yes	1. Cora, Citeseer, and Pub Med (Sen et al., 2008) are standard citation networks where nodes represent scientific papers, and edges denote citations between them. 2. Photo and Computers (Shchur et al., 2018) are graphs derived from Amazon shopping records. 3. Physics and CS (Shchur et al., 2018) are co-authorship graph datasets where nodes represent authors. 4. PPI Networks (Zitnik and Leskovec, 2017) are protein-protein interaction networks originally designed for graph classification tasks.
Dataset Splits	Yes	We split the training, validation, and test set following (Kipf and Welling, 2017; Veliˇckovi c et al., 2018). To be specific, the training set consists of 20 nodes from each class, and the validation (test, resp.) set consists of 500 (1000, resp.) nodes. Following (Chami et al., 2019), we use 5% (resp. 10%) of existing links in the input graph as the positive validation set (resp. positive test set). An equal number of non-existent links are sampled as the negative validation set and negative test set. The remaining 85% existing links are used as the positive training set.
Hardware Specification	No	The paper does not explicitly mention any specific hardware used for running its experiments, such as GPU/CPU models or cloud resources.
Software Dependencies	Yes	We utilize the Python package Dionysus, version 2.0.7, to compute the EPD.
Experiment Setup	Yes	We initialize the model with Glorot initialization and use cross-entropy loss and Adam optimizer to train our model. In the optimizer, the learning rate is 0.005, and the weight decay is 0.0005. The training epoch is set to 200, and the early stopping on the validation set is 100 (patience epochs). For a fair comparison, we set the number of node embeddings of the hidden layer to be the same (64) for all networks. For all the models, the number of GNN layers is set to 2. All the activation function after each graph convolution block is ELU. Cross-Entropy Loss is chosen as the loss function and Adam is adopted as the optimizer with the learning rate set to 0.01 and weight-decay set to 0. Dropout is 0.8 for Cora and Citeseer, and 0.5 for the rest of the graphs. ... The training epoch is 2000, and the early stopping on the validation set is 200 (patience epochs).