reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generative Risk Minimization for Out-of-Distribution Generalization on Graphs

Authors: Song Wang, Zhen Tan, Yaochen Zhu, Chuxu Zhang, Jundong Li

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further conduct extensive experiments across a variety of real-world graph datasets for both node-level and graph-level OOD generalization, and the results demonstrate the superiority of our framework GRM.
Researcher Affiliation	Academia	Song Wang EMAIL Department of Electrical and Computer Engineering University of Virginia Zhen Tan EMAIL Department of Electrical and Computer Engineering University of Virginia Yaochen Zhu EMAIL Department of Electrical and Computer Engineering University of Virginia Chuxu Zhang EMAIL School of Computing University of Connecticut Jundong Li EMAIL Department of Electrical and Computer Engineering University of Virginia
Pseudocode	No	The paper describes the methodology and objective functions using mathematical equations and textual explanations, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Our code is provided at https://github.com/Song W-SW/GRM.
Open Datasets	Yes	In our node-level OOD generalization experiments, we evaluate GRM and other state-of-theart baselines on six real-world datasets that cover different topics and tasks, following EERM (Wu et al., 2022a). We summarize the statistics of these datasets in Table 1. Specifically, we use datasets that involve three different types of distribution shifts: (1) Artificial Transformation denotes that synthetic spurious features are added to these datasets; (2) Cross-Domain Transfers means that each domain in the datasets corresponds to a graph distinct from each other; (3) Temporal Evolution means that the datasets are dynamic with evolving nature. Each type includes two datasets. More details about these datasets can be found in Appendix B.
Dataset Splits	Yes	For training and validation, we utilize one graph each, while the classification accuracy is reported on the remaining graphs. This process is performed ten times for each dataset, resulting in ten graphs with different domain IDs. For training and validation, we utilize one graph each, while the classification accuracy is reported on the remaining graphs. The Elliptic dataset consists of a series of 49 graph snapshots... Consequently, we exclude these snapshots and focus on the 7th to 11th, 12th to 17th, and 17th to 49th snapshots for training, validation, and testing, respectively. The Arxiv dataset comprises 169,343 Arxiv CS papers... Specifically, the dataset consists of papers published before 2011 for training, papers from 2011 to 2014 for validation, and papers after 2014 for testing.
Hardware Specification	Yes	During training, we conduct all experiments on one NVIDIA A6000 GPU with 48GB of memory.
Software Dependencies	Yes	The package requirements of our experiments are listed below. Python == 3.7.10, torch == 1.8.1, numpy == 1.18.5, scipy == 1.5.3, networkx == 2.5.1, scikit-learn == 0.24.1, pandas == 1.2.3
Experiment Setup	Yes	Specifically, we use the Adam optimizer (Kingma & Ba, 2015) for training. The dropout rate is set as 0.3, and the weight decay rate is 0.001. The learning rate is set as 0.01. Given an input graph, we utilize two 2-layer GCNs (Kipf & Welling, 2017), with a hidden dimension size of 128, to learn domain-specific representations and node representations. Then we concatenate these two representations as the input of our VAE-based generator. The encoder of the generator is also implemented as a 2-layer GCN. The dimension of latent variables (i.e., dz) is set as 128. For the specific values of L and P in selecting nodes for learning domain-specific representations, we set them as 3 and 1.5, respectively. For the neighborhood size of the computation graph G of node v, i.e., L, we set it as 2. In other words, two-hop neighbors will be included in the computation graph G. We run 5 times for this process and aggregate the classification results.