Graph Structure Refinement with Energy-based Contrastive Learning

Authors: Xianlin Zeng, Yufeng Wang, Yuqi Sun, Guodong Guo, Wenrui Ding, Baochang Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that ECL-GSR outperforms the state-of-the-art on eight benchmark datasets in node classification. ECL-GSR achieves faster training with fewer samples and memories against the leading baseline, highlighting its simplicity and efficiency in downstream tasks. We conduct comprehensive experiments to sequentially evaluate the proposed framework s effectiveness, complexity, and robustness, addressing five research questions: RQ1: How effective is ECL-GSR on the node classification task? RQ2: How efficient is ECL-GSR in terms of training time and space? RQ3: How do ECL architecture and its hyperparameters impact the performance of node-level representation learning? RQ4: How robust is ECL-GSR in the face of structural attacks or noises? RQ5: What kind of refined structure does ECL-GSR learn?
Researcher Affiliation Collaboration Xianlin Zeng1, 2, Yufeng Wang1, Yuqi Sun1, Guodong Guo3, Wenrui Ding1, Baochang Zhang1 1Beihang University, Beijing, P.R.China 2Postdoctoral Research Station at China Rong Tong Academy of Sciences Group Corporation Limited, Beijing, P.R.China 3Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, P.R.China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes The pseudocode of ECL-GSR is illustrated in Algorithm 1.
Open Source Code No The paper does not explicitly provide a statement about releasing code, nor does it include a link to a code repository for the described methodology.
Open Datasets Yes Datasets For extensive comparison, we execute experiments on eight benchmark datasets: four citation networks (Cora, Citeseer (Sen et al. 2008), Pubmed (Namata et al. 2012), and OGB-Arxiv (Hu et al. 2020)), three webpage graphs (Cornell, Texas, and Wisconsin (Pei et al. 2020)), and one actor co-occurrence network (Actor (Tang et al. 2009)).
Dataset Splits Yes Evaluation on standard splits As stated in Table 1, three key observations can be made: i) ECL-GSR shows robust performance across all benchmark datasets, demonstrating its superior generalizability to diverse data. Notably, within the ambit of eight datasets, ECL-GSR achieves the state-ofthe-art with margins ranging from 0.15% to 1.61% over the second-highest approach. ... Evaluation on different train ratios In Table 2, we conduct experiments on Cora and Citeseer datasets with varying amounts of supervised information, specifically at training ratios of 1%, 3%, 5%, and 10%.
Hardware Specification Yes Our framework operates on an Ubuntu system with an NVIDIA Ge Force 3090 GPU, employing Py Torch 1.12.1, DGL 1.1.0, and Python 3.9.16.
Software Dependencies Yes Our framework operates on an Ubuntu system with an NVIDIA Ge Force 3090 GPU, employing Py Torch 1.12.1, DGL 1.1.0, and Python 3.9.16.
Experiment Setup Yes Subgraph sampling batch size N is fixed at 64 for efficiency consideration. In ECL, the backbone fθ( ) is divided into ϕθ( ) for encoding, utilizing three GCN layers with the hidden and output dimension e F of 128, and φθ( ) for projection, comprising two fully-connected layers with an output dimension F of 128. The learned representation e Z is produced by ϕθ( ). Batch normalization is discarded when utilizing SGLD. The data augmentation operator T is a random Gaussian blur. For node classification, classifier Cθ( ) mirrors the architecture of ϕθ( ). Our model s final hyperparameters are set as: α=0.1, β=0.01, µ=0.01, and τ=0.1. We adopt the Adam optimizer with an initial learning rate of 0.001, halving every 20 epochs. The epochs P for Cora, Citeseer, Cornell, Texas, and Wisconsin are 40, and those for Actor, Pubmed, and OGB-Arxiv are 80. The number of SGLD s iterations K only takes 3 steps.