reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Self-supervised Masked Graph Autoencoder via Structure-aware Curriculum

Authors: Haoyang Li, Xin Wang, Zeyang Zhang, Zongyuan Wu, Linxin Xiao, Wenwu Zhu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct comprehensive experiments to evaluate the effectiveness of the proposed Cur-MGAE method. This includes the experimental setup, quantitative evaluations on node classification and link prediction benchmarks, and in-depth analyses. Additional experimental results are provided in Appendix G. Experiments on several real-world node classification and link prediction datasets demonstrate the superiority of our proposed method over state-of-the-art graph self-supervised learning baselines.
Researcher Affiliation	Academia	1Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, China. Correspondence to: Xin Wang <EMAIL>, Wenwu Zhu <EMAIL>.
Pseudocode	Yes	Algorithm 1 The optimization process of Cur-MGAE
Open Source Code	No	No explicit statement or link for the open-source code of the described methodology is provided in the paper. The paper only provides dataset licenses and links.
Open Datasets	Yes	The datasets included in this work are publicly available as follows: 1. Plantoid Datasets: https://github.com/kimiyoung/planetoid/raw/master/data/ with MIT License. 2. Coauthor Datasets: https://github.com/shchur/gnn-benchmark/raw/master/data/npz/ with MIT License. 3. Open Graph Benchmark (OGB): https://ogb.stanford.edu.docs/graphprop/ with MIT License.
Dataset Splits	Yes	For datasets from the OGB benchmark, we follow the standardized experimental protocol, which provides predefined train/validation/test splits. ... Table 5. Summary of the dataset statistics. ... Cora 85/5/10 ... OGBL-ddi 80/10/10 ... OGBL-collab 92/4/4 ... OGBL-ppa 70/20/10 ... For node classification, we evaluate the learned node representations using a downstream linear SVM classifier. We report the average 10-fold cross-validation accuracy with standard deviation over three repeated runs.
Hardware Specification	Yes	We conduct the experiments with the following hardware and software configurations: Operating System: Ubuntu 20.04.6 LTS CPU: Intel(R) Xeon(R) Gold 6348 CPU@2.60GHz GPU: NVIDIA Ge Force RTX 4090 GPU
Software Dependencies	Yes	Software: Python 3.8.13; Py Torch 2.0.1; Py Torch Geometric 2.3.1.
Experiment Setup	Yes	We implement our models using Py Torch and employ Stochastic Gradient Descent (SGD) as the optimizer. The number of training epochs is set to 400 for node classification tasks and 200 for link prediction tasks, with an early stopping patience of 50 steps. ... For large-scale datasets from the OGB benchmark ... we set the representation dimensionality d to 256 and use a 3-layer GNN. For all other datasets, we set d = 128 and use a 2-layer GNN. ... The cross-correlation decoder is implemented as a two-layer multilayer perceptron (MLP) with Re LU activation, and its hidden dimension is selected from {128, 256, 512, 1024}. The values for split ratio and mask ratio are tuned within the ranges [0, 1] and [0.4, 1] (with a step size of 0.1), respectively. The dropout rate is chosen from {0.3, 0.4, 0.5, 0.6}. ... The hyperparameter λ controls the number of edges selected during training. ... λ is designed to increase with the iteration step t, following the schedule below (Zhang et al., 2023a): (λinitial T 2 / 3 +1 t if t < T / 2 λinitial otherwise, (7)