reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethinking Graph Contrastive Learning Through Relative Similarity Preservation

Authors: Zhiyuan Ning, Pengfei Wang, Ziyue Qiao, Pengyang Wang, Yuanchun Zhou

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method consistently outperforms 20 existing approaches across both homophily and heterophily graphs, validating the effectiveness of leveraging natural relative similarity over artificial absolute similarity. Section 4, titled 'Experiments', details the experimental setup, datasets, baselines, and evaluation protocol, presenting performance tables (e.g., Table 1, 2, 3, 4) and structural pattern analysis (e.g., Figure 4, 5).
Researcher Affiliation	Academia	All authors are affiliated with academic institutions: 'Computer Network Information Center, Chinese Academy of Sciences', 'University of Chinese Academy of Sciences', 'Hangzhou Institute for Advanced Study, UCAS', 'School of Computing and Information Technology, Great Bay University', and 'Department of Computer and Information Science, IOTSC, University of Macau'. The email domains also correspond to these academic institutions (e.g., cnic.cn, gbu.edu.cn, um.edu.mo).
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methodology is described using mathematical equations and textual explanations.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide links to code repositories or mention code in supplementary materials.
Open Datasets	Yes	To validate the universality of our approach, we conduct experiments on 11 real-world datasets spanning diverse domains and scales (2K to 169K nodes), including 8 homophily graphs (Cora, Citeseer, Pubmed, Wiki CS, Amazon Computers, Amazon-Photo, Coauthor-CS, and ogbn-arxiv) and 3 heterophily graphs (Chameleon, Squirrel, and Actor). These are widely recognized and publicly available benchmark datasets in graph machine learning.
Dataset Splits	Yes	For datasets with multiple splits (e.g., Wiki CS with 20 public splits), we conduct experiments on all provided splits. Following standard practice in GCL [Velickovic et al., 2019; Zhu et al., 2020b; Thakoor et al., 2021], we evaluate using linear evaluation: first train the graph encoder in a self-supervised manner using our relative similarity objectives, then freeze it to generate node embeddings for training a logistic regression classifier. On the large-scale ogbn-arxiv graph dataset (169K nodes, 1.2M edges), following [Hu et al., 2020; Thakoor et al., 2021], both implementations outperform existing unsupervised methods (Table 3).
Hardware Specification	Yes	We implement RELGCL using GCN [Welling and Kipf, 2016] as the encoder, optimized with Adam [Kingma and Ba, 2014] on a NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions using GCN as the encoder and Adam for optimization, but it does not specify version numbers for these or any other software libraries or programming languages (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	Based on our empirical analysis in Section 2.1, we set the neighborhood range k = 4 to capture meaningful structural patterns, as the number of semantically similar neighbors becomes particularly small beyond 4 hops. Both approaches use a threshold α (0, 1) to prevent over-optimization of similarity ratios. We implement RELGCL using GCN [Welling and Kipf, 2016] as the encoder, optimized with Adam [Kingma and Ba, 2014] on a NVIDIA V100 GPU.