reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Enhancing Graph Invariant Learning from a Negative Inference Perspective

Authors: Kuo Yang, Zhengyang Zhou, Qihe Huang, Wenjie Du, Limin Li, Wu Jiang, Yang Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a comprehensive evaluation of Ne Go on real-world datasets and synthetic datasets across domains. Ne Go outperforms baselines on nearly all datasets, which verify the effectiveness of our framework. ... We conduct extensive experiments on both synthetic and real-world datasets with distribution shifts to evaluate the performance of Ne Go. The results from both visualization and quantitative analysis indicate that our framework successfully achieves accurate prediction in complex environmental scenarios.
Researcher Affiliation	Collaboration	1University of Science and Technology of China (USTC), Hefei, China 2Suzhou Institute for Advanced Research, USTC, Suzhou, China 3China Mobile Communications Group Co.,Ltd.
Pseudocode	Yes	Algorithm 1 The training process of Ne Go
Open Source Code	No	The paper mentions implementing the model with PyTorch and performing experiments, but does not provide a specific link to the source code, an explicit statement of code release, or indicate that code is included in supplementary materials.
Open Datasets	Yes	We adopt two synthetic datasets with distribution shift and six real-world scenario shift datasets from both molecular and social science domains. Synthetic datasets include GOOD-Motif (Wu et al., 2022c) and GOOD-CMNIST (Gui et al., 2022). In molecular property prediction fields, we select the scaffold and size splits of GOOD-HIV dataset (Gui et al., 2022; Wu et al., 2018) and the assay and size splits of Drug OOD LBAP-core-ic50 dataset (Ji et al., 2022). We also choose two social sentiment graph datasets with distribution shifts, including GOOD-SST2 and GOODTwitter (Yuan et al., 2022).
Dataset Splits	Yes	Detailed statistics on the number of graphs in those datasets are provided in Tab. 6. Table 6: Dataset Training ID validation ID test OOD validation OOD test
Hardware Specification	Yes	We implement our Nego and parts of baselines with Py Torch 1.10.1 on a server with NVIDIA A100-PCIE-40GB.
Software Dependencies	Yes	We implement our Nego and parts of baselines with Py Torch 1.10.1 on a server with NVIDIA A100-PCIE-40GB.
Experiment Setup	Yes	During the training stage, we employ the Adam optimizer. We set the maximum number of training epochs to 200. The batch size of training is set as 32 except for GOOD-CMNIST, which uses a batch size of 64. For GOOD-Motif, GOOD-CMNIST and GOODSST2, the learning rate is set to 5 10 4. For GOOD-HIV, GOOD-Twitter, and Drug OOD, we exploit a learning rate of 10 4. Additionally, we utilize a weight decay of 10 4 to help with regularization and prevent overfitting.