reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Out-of-Distribution Generalization on Graphs via Progressive Inference

Authors: Yiming Xu, Bin Shi, Zhen Peng, Huixiang Liu, Bo Dong, Chen Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our proposed GPro outperforms the state-of-the-art methods by 4.91% on average. For datasets with more severe distribution shifts, the performance improvement can be up to 6.86%.
Researcher Affiliation	Academia	1School of Computer Science and Technology, Xi an Jiaotong University 2Shaanxi Provincial Key Laboratory of Big Data Knowledge Engineering, Xi an Jiaotong University 3School of Distance Education, Xi an Jiaotong University 4University of Virginia, Charlottesville, Virginia, USA
Pseudocode	No	The paper describes the methodology using text and mathematical equations, but it does not include a clearly labeled pseudocode or algorithm block in the main text. It mentions 'The details of our algorithm are summarized in the Appendix.' but the appendix is not provided for analysis.
Open Source Code	Yes	Code https://github.com/yimingxu24/GPro
Open Datasets	Yes	We use three benchmark graph classification datasets in causal learning (Fan et al. 2022), namely CMNIST-75sp, CFashion-75sp, and CKuzushiji-75s, to evaluate the performance of the models on out-of-distribution (OOD) problems.
Dataset Splits	Yes	The datasets are divided into the training set: validation set: testing set in the ratio of 10K:5K:10K.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and GCN, but does not provide specific version numbers for these or other software libraries/dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We use the Adam optimizer (Kingma and Ba 2014), and the learning rate is 0.01. For Eq. (7) and Eq. (8), we use the GCN (Kipf and Welling 2017) with 2 layers and 146 hidden dimensions as the encoder. We train the GPro with 200 epochs and add Lcou loss function at the 100th epoch. The batch size is 256. The default value for the number of causal and non-causal substructure context inference blocks is 2, and ρ are 0.9 and 0.8, respectively. We set q of GCE loss as 0.7 to amplify the focus on the non-causal part, λ1 is 15, λ2 is 0.01 and λ3 is 1.