Learning Graph Invariance by Harnessing Spuriosity

Authors: Tianjun Yao, Yongqiang Chen, Kai Hu, Tongliang Liu, Kun Zhang, Zhiqiang Shen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on synthetic datasets demonstrate that LIRS is able to learn more invariant features compared to state-of-the-art graph invariant learning methods that adopt the direct invariant learning paradigm. Furthermore, LIRS shows superior OOD performance on real-world datasets with various types of distribution shifts, highlighting its effectiveness in learning graph invariant features. Our contributions can be summarized as follows: [...] Extensive experiments demonstrate that LIRS outperforms second-best baseline methods by up to 25.50% across 17 competitive baselines on both synthetic and real-world datasets with various distribution shifts.
Researcher Affiliation Academia 1Mohamed bin Zayed University of Artificial Intelligence 2Carnegie Mellon University 3The University of Sydney
Pseudocode Yes Algorithm 1 The LIRS framework
Open Source Code Yes 1Code is available at https://github.com/tianyao-aka/LIRS-ICLR2025
Open Datasets Yes We adopt GOODMotif and GOODHIV datasets (Gui et al., 2022), OGBG-Molbace and OGBG-Molbbbp datasets (Hu et al., 2020; Wu et al., 2018) to comprehensively evaluate the OOD generalization performance of our proposed framework.
Dataset Splits Yes Table 7: Details about the datasets used in our experiments. DATASETS Split # TRAINING # VALIDATION # TESTING # CLASSES METRICS GOOD-HIV Scaffold 24682 4113 4108 2 ROC-AUC
Hardware Specification Yes We run all the experiments on Linux servers with RTX 4090 and CUDA 12.2.
Software Dependencies No All the experiments are ran with Py Torch (Paszke et al., 2019) and Py Torch Geometric (Fey & Lenssen, 2019). We adopt Py GCL (Zhu et al., 2021) package and modify the source code in Dual Branch Contrast to implement the biased infomax to generate spurious embeddings. To generate logits from the spurious embeddings, we use Mini Batch KMeans, linear SVC, and Calibrated Classifier CV in Scikit-Learn package (Pedregosa et al., 2011). No specific version numbers for these software libraries are provided, only the publication years of their respective papers.
Experiment Setup Yes Optimization and evaluation. By default, we use Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1e 3 and a batch size of 64 for all experiments. we also employ an early stopping of 10 epochs according to the validation performance for all datasets. [...] Hyperparameter search for LIRS. The penalty weight for LInv in LIRS is searched over {1e 1, 1e 2, 1e 3}. The reweighting coefficient γ is searched over {0.1, 0.3, 0.5, 0.7, 0.9}. The cluster number C is searched over {3, 5, 10}. The training epoch E at which the spurious embedding is derived from the biased infomax is searched over {50, 60, 70, 80, 90} for real-world datasets, and for the synthetic datasets, The training epoch E is searched over {5, 6, 7, 8, 9}.