Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Mitigating Spurious Correlations in Zero-Shot Multimodal Models
Authors: Shenyu Lu, Junyi Chai, Xiaoqian Wang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on benchmark datasets, which have shown significant improvements in worst-group accuracy. Additionally, our visualizations of VLMs further demonstrate the effectiveness of this intervention. |
| Researcher Affiliation | Academia | Shenyu Lu, Junyi Chai & Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47906, USA EMAIL |
| Pseudocode | Yes | We summarize our method in Algorithm 1. |
| Open Source Code | Yes | 1Code at https://github.com/lu876/TIE |
| Open Datasets | Yes | Datasets. We study five well-established benchmark datasets for spurious correlation research: Waterbirds (Koh et al., 2021; Sagawa et al., 2019), Celeb A (Liu et al., 2015), ISIC (Codella et al., 2019), COVID-19 (Cohen et al., 2020), FMOW (Christie et al., 2018). |
| Dataset Splits | Yes | Following the protocol established by robust learning studies (Sagawa et al., 2019; Adila et al., 2024), we report three metrics: worst group accuracy (WG), average accuracy (Avg), and the gap between these two metrics (Gap). |
| Hardware Specification | Yes | We conducted all experiments on an Nvidia RTX 3090 GPU with 24 GB of memory, using frozen CLIP models across various datasets. |
| Software Dependencies | No | The paper mentions "Model construction and pre-trained weights are sourced from Open CLIP (Ilharco et al., 2021)" and "We utilize GPT-4 (Open AI, 2023)" but does not provide specific version numbers for these or other key software libraries like PyTorch, numpy, or scikit-learn that would be necessary for reproduction. |
| Experiment Setup | Yes | The model was trained using an SGD optimizer with a learning rate of 10 4, a weight decay of 10 3, and a momentum of 0.9, over 200 epochs. |