reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Node Duplication Improves Cold-start Link Prediction

Authors: Zhichun Guo, Tong Zhao, Yozen Liu, Kaiwen Dong, William Shiao, Mingxuan Ju, Neil Shah, Nitesh V Chawla

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Node Dup achieves 38.49%, 13.34%, and 6.76% relative improvements on isolated, low-degree, and warm nodes, respectively, on average across all datasets compared to GNNs and the existing cold-start methods.
Researcher Affiliation	Collaboration	1University of Washington 2Snap Inc. 3University of Notre Dame 4University of California, Riverside
Pseudocode	Yes	Algorithm 1: Node Dup.
Open Source Code	Yes	Our code can be found at https://github.com/zhichunguo/Node Dup.
Open Datasets	Yes	We conduct experiments on 7 benchmark datasets: Cora, Citeseer, CS, Physics, Computers, Photos and IGB-100K, with their details specified in Appendix B.
Dataset Splits	Yes	We randomly split edges into training, validation, and testing sets. We allocated 10% for validation and 40% for testing in Computers and Photos, 5%/10% for testing in IGB-100K, and 10%/20% in other datasets.
Hardware Specification	Yes	All methods were implemented in Python 3.10.9 with Pytorch 1.13.1 and Py Torch Geometric (Fey & Lenssen, 2019). The experiments were all conducted on an NVIDIA P100 GPU with 16GB memory.
Software Dependencies	Yes	All methods were implemented in Python 3.10.9 with Pytorch 1.13.1 and Py Torch Geometric (Fey & Lenssen, 2019).
Experiment Setup	Yes	We use 2-layer GNN architectures with 256 hidden dimensions for all GNNs and datasets. The dropout rate is set as 0.5. We report the results over 10 random seeds. Hyperparameters were tuned using an early stopping strategy based on performance on the validation set. We manually tune the learning rate for the final results. For the results with the inner product as the decoder, we tune the learning rate over range: lr {0.001, 0.0005, 0.0001, 0.00005}. For the results with MLP as the decoder, we tune the learning rate over range: lr {0.01, 0.005, 0.001, 0.0005}.