Self-supervised Masked Graph Autoencoder via Structure-aware Curriculum

Authors: Haoyang Li, Xin Wang, Zeyang Zhang, Zongyuan Wu, Linxin Xiao, Wenwu Zhu

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct comprehensive experiments to evaluate the effectiveness of the proposed Cur-MGAE method. This includes the experimental setup, quantitative evaluations on node classification and link prediction benchmarks, and in-depth analyses. Additional experimental results are provided in Appendix G. Experiments on several real-world node classification and link prediction datasets demonstrate the superiority of our proposed method over state-of-the-art graph self-supervised learning baselines.
Researcher Affiliation Academia 1Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, China. Correspondence to: Xin Wang <EMAIL>, Wenwu Zhu <EMAIL>.
Pseudocode Yes Algorithm 1 The optimization process of Cur-MGAE
Open Source Code No No explicit statement or link for the open-source code of the described methodology is provided in the paper. The paper only provides dataset licenses and links.
Open Datasets Yes The datasets included in this work are publicly available as follows: 1. Plantoid Datasets: https://github.com/kimiyoung/planetoid/raw/master/data/ with MIT License. 2. Coauthor Datasets: https://github.com/shchur/gnn-benchmark/raw/master/data/npz/ with MIT License. 3. Open Graph Benchmark (OGB): https://ogb.stanford.edu.docs/graphprop/ with MIT License.
Dataset Splits Yes For datasets from the OGB benchmark, we follow the standardized experimental protocol, which provides predefined train/validation/test splits. ... Table 5. Summary of the dataset statistics. ... Cora 85/5/10 ... OGBL-ddi 80/10/10 ... OGBL-collab 92/4/4 ... OGBL-ppa 70/20/10 ... For node classification, we evaluate the learned node representations using a downstream linear SVM classifier. We report the average 10-fold cross-validation accuracy with standard deviation over three repeated runs.
Hardware Specification Yes We conduct the experiments with the following hardware and software configurations: Operating System: Ubuntu 20.04.6 LTS CPU: Intel(R) Xeon(R) Gold 6348 CPU@2.60GHz GPU: NVIDIA Ge Force RTX 4090 GPU
Software Dependencies Yes Software: Python 3.8.13; Py Torch 2.0.1; Py Torch Geometric 2.3.1.
Experiment Setup Yes We implement our models using Py Torch and employ Stochastic Gradient Descent (SGD) as the optimizer. The number of training epochs is set to 400 for node classification tasks and 200 for link prediction tasks, with an early stopping patience of 50 steps. ... For large-scale datasets from the OGB benchmark ... we set the representation dimensionality d to 256 and use a 3-layer GNN. For all other datasets, we set d = 128 and use a 2-layer GNN. ... The cross-correlation decoder is implemented as a two-layer multilayer perceptron (MLP) with Re LU activation, and its hidden dimension is selected from {128, 256, 512, 1024}. The values for split ratio and mask ratio are tuned within the ranges [0, 1] and [0.4, 1] (with a step size of 0.1), respectively. The dropout rate is chosen from {0.3, 0.4, 0.5, 0.6}. ... The hyperparameter λ controls the number of edges selected during training. ... λ is designed to increase with the iteration step t, following the schedule below (Zhang et al., 2023a): (λinitial T 2 / 3 +1 t if t < T / 2 λinitial otherwise, (7)