reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Node-Time Conditional Prompt Learning in Dynamic Graphs

Authors: Xingtong Yu, Zhenghao Liu, Xinming Zhang, Yuan Fang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we thoroughly evaluate and analyze DYGPROMPT through extensive experiments on four public datasets. 5 EXPERIMENTS In this section, we conduct experiments to evaluate DYGPROMPT and analyze the empirical results. 5.1 EXPERIMENTAL SETUP Datasets. We utilize four benchmark datasets for evaluation: Wikipedia, Reddit, MOOC and Genre. Table 1: AUC-ROC (%) evaluation of temporal node classification and link prediction.
Researcher Affiliation	Academia	Xingtong Yu1 , Zhenghao Liu2 , Xinming Zhang2 , Yuan Fang1 Singapore Management University1, University of Science and Technology of China2 EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 DOWNSTREAM PROMPT TUNING FOR DYGPROMPT
Open Source Code	No	For all open-source baselines, we utilize the officially provided code. For the non-open-source CPDG and TIGPrompt, we use our own implementations.
Open Datasets	Yes	Wikipedia2 represents a month of modifications made by contributors on Wikipedia pages (Ferschke et al., 2012). Reddit3 represents a dynamic network between posts and users on subreddits... MOOC4 is a student-course dataset... Genre5 is a dynamic network linking users to music genres... We evaluate DYGPROMPT under the same setting introduced in Sect. 5.1 on a large-scale dataset DGraph (Huang et al., 2022) While most of our datasets involve binary classification, we also conduct five-way node classification on the ML-Rating dataset (Harper & Konstan, 2015)
Dataset Splits	Yes	Specifically, given a chronologically ordered sequence of events (i.e., edges with timestamps), we use the first 80% for pre-training. Note that we pre-train a DGNN only once for each dataset and subsequently employ the same pre-trained model for all downstream tasks. The remaining 20% of the events are used for downstream tasks. Specifically, they are further split into 1%/1%/18% subsets, where the first 1% is reserved as the training pool for downstream prompt tuning, and the next 1% as the validation pool, with the last 18% for downstream testing.
Hardware Specification	Yes	Operating system: Windows 11 CPU information: 13th Gen Intel(R) Core(TM) i5-13600KF GPU information: Ge Force RTX 4070Ti (12 GB)
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA versions), only mentions the Adam optimizer and general model architectures.
Experiment Setup	Yes	For all experiments, we use the Adam optimizer. For both GCN and GAT in Roland, we employ a 2-layer architecture. For TGAT and TGN, we sample 20 temporal neighbors per node to update their representations. For all baselines, we set the hidden dimension to 172 for Wikipedia, Reddit, and MOOC, and to 86 for Genre. For our proposed DYGPROMPT, We conducted experiments using TGN and TGAT as backbones. We employ a dual-layer perceptrons with bottleneck structure as the condition-net, and set the hidden dimension of the condition net as 86 for Wikipedia, Reddit and MOOC, while 43 for Genre. We set the hidden dimension to 172 for Wikipedia, Reddit, and MOOC, and to 86 for Genre.