Learning Invariant Causal Mechanism from Vision-Language Models

Authors: Zeen Song, Siyu Zhao, Xingyu Zhang, Jiangmeng Li, Changwen Zheng, Wenwen Qiang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several OOD datasets show that CLIPICM significantly improves the performance of CLIP. Our method offers a simple but powerful enhancement, boosting the reliability of CLIP in real-world applications. The source code is available at https://github.com/Zeen Song/CLIP-ICM. ... To evaluate the CLIP-ICM framework in OOD scenarios, we conduct experiments on the Domain Bed benchmark (Gulrajani & Lopez-Paz).
Researcher Affiliation Academia 1Institute of Software Chinese Academy of Sciences, Beijing, China 2University of the Chinese Academy of Sciences. Correspondence to: Wenwen Qiang <EMAIL>.
Pseudocode Yes H.3. Pseudo Code The pseodo code of CLIP-ICM is illustrated in Algorithm 1. Algorithm 1 CLIP-ICM
Open Source Code Yes The source code is available at https://github.com/Zeen Song/CLIP-ICM.
Open Datasets Yes We conduct an experiment on the Terra Incognita dataset (Beery et al., 2018)... We evaluate the proposed CLIPICM on OOD generalization datasets, including Domainbed (Gulrajani & Lopez-Paz) and variants of Image Net (Recht et al., 2019; Hendrycks et al., 2021b;a; Wang et al., 2019). ... We use five datasets from Domain Bed: PACS (Li et al., 2017), VLCS (Fang et al., 2013), Office Home (Venkateswara et al., 2017), Terra Incognita (Beery et al., 2018), and Domain Net (Peng et al., 2019).
Dataset Splits Yes Specifically, for a given target domain, a linear classifier is trained on frozen CLIP image embeddings from all other domains and tested on the held-out domain to assess how well the model handles shifts in distribution. ... For domain shift, we use a leave-one-out protocol, training on all domains except the target and testing on the target domain (Table 2). For the combined setting, we split data into base and new classes, train on base classes in training domains, and evaluate both base and new classes in the target domain (Table 3). ... For all datasets, we first pool the raw training, validation, and testing images together. For each random seed, we then instantiate random training, validation, and testing splits.
Hardware Specification Yes All experiments are conducted on a single NVIDIA-RTX A6000 GPU
Software Dependencies No The paper mentions using "GPT-4o (Open AI, 2023)" as a model for interventional data generation, but does not provide specific version numbers for ancillary software components (e.g., programming languages, libraries, or frameworks like Python, PyTorch, or CUDA) used for implementing their methodology.
Experiment Setup Yes Each value in Table 2 and Table 3 represents the mean and standard deviation over 5 runs with different random seeds. ... Here, IDinv denotes the identity matrix of dimension Dinv, and λ is a regularization hyperparameter. ... We conduct an ablation study regarding the choice of Dinv.