Learning Graph Invariance by Harnessing Spuriosity
Authors: Tianjun Yao, Yongqiang Chen, Kai Hu, Tongliang Liu, Kun Zhang, Zhiqiang Shen
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on synthetic datasets demonstrate that LIRS is able to learn more invariant features compared to state-of-the-art graph invariant learning methods that adopt the direct invariant learning paradigm. Furthermore, LIRS shows superior OOD performance on real-world datasets with various types of distribution shifts, highlighting its effectiveness in learning graph invariant features. Our contributions can be summarized as follows: [...] Extensive experiments demonstrate that LIRS outperforms second-best baseline methods by up to 25.50% across 17 competitive baselines on both synthetic and real-world datasets with various distribution shifts. |
| Researcher Affiliation | Academia | 1Mohamed bin Zayed University of Artificial Intelligence 2Carnegie Mellon University 3The University of Sydney |
| Pseudocode | Yes | Algorithm 1 The LIRS framework |
| Open Source Code | Yes | 1Code is available at https://github.com/tianyao-aka/LIRS-ICLR2025 |
| Open Datasets | Yes | We adopt GOODMotif and GOODHIV datasets (Gui et al., 2022), OGBG-Molbace and OGBG-Molbbbp datasets (Hu et al., 2020; Wu et al., 2018) to comprehensively evaluate the OOD generalization performance of our proposed framework. |
| Dataset Splits | Yes | Table 7: Details about the datasets used in our experiments. DATASETS Split # TRAINING # VALIDATION # TESTING # CLASSES METRICS GOOD-HIV Scaffold 24682 4113 4108 2 ROC-AUC |
| Hardware Specification | Yes | We run all the experiments on Linux servers with RTX 4090 and CUDA 12.2. |
| Software Dependencies | No | All the experiments are ran with Py Torch (Paszke et al., 2019) and Py Torch Geometric (Fey & Lenssen, 2019). We adopt Py GCL (Zhu et al., 2021) package and modify the source code in Dual Branch Contrast to implement the biased infomax to generate spurious embeddings. To generate logits from the spurious embeddings, we use Mini Batch KMeans, linear SVC, and Calibrated Classifier CV in Scikit-Learn package (Pedregosa et al., 2011). No specific version numbers for these software libraries are provided, only the publication years of their respective papers. |
| Experiment Setup | Yes | Optimization and evaluation. By default, we use Adam optimizer (Kingma & Ba, 2014) with a learning rate of 1e 3 and a batch size of 64 for all experiments. we also employ an early stopping of 10 epochs according to the validation performance for all datasets. [...] Hyperparameter search for LIRS. The penalty weight for LInv in LIRS is searched over {1e 1, 1e 2, 1e 3}. The reweighting coefficient γ is searched over {0.1, 0.3, 0.5, 0.7, 0.9}. The cluster number C is searched over {3, 5, 10}. The training epoch E at which the spurious embedding is derived from the biased infomax is searched over {50, 60, 70, 80, 90} for real-world datasets, and for the synthetic datasets, The training epoch E is searched over {5, 6, 7, 8, 9}. |