Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Mitigating Spurious Correlations in Zero-Shot Multimodal Models

Authors: Shenyu Lu, Junyi Chai, Xiaoqian Wang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted experiments on benchmark datasets, which have shown significant improvements in worst-group accuracy. Additionally, our visualizations of VLMs further demonstrate the effectiveness of this intervention.
Researcher Affiliation Academia Shenyu Lu, Junyi Chai & Xiaoqian Wang Elmore Family School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47906, USA EMAIL
Pseudocode Yes We summarize our method in Algorithm 1.
Open Source Code Yes 1Code at https://github.com/lu876/TIE
Open Datasets Yes Datasets. We study five well-established benchmark datasets for spurious correlation research: Waterbirds (Koh et al., 2021; Sagawa et al., 2019), Celeb A (Liu et al., 2015), ISIC (Codella et al., 2019), COVID-19 (Cohen et al., 2020), FMOW (Christie et al., 2018).
Dataset Splits Yes Following the protocol established by robust learning studies (Sagawa et al., 2019; Adila et al., 2024), we report three metrics: worst group accuracy (WG), average accuracy (Avg), and the gap between these two metrics (Gap).
Hardware Specification Yes We conducted all experiments on an Nvidia RTX 3090 GPU with 24 GB of memory, using frozen CLIP models across various datasets.
Software Dependencies No The paper mentions "Model construction and pre-trained weights are sourced from Open CLIP (Ilharco et al., 2021)" and "We utilize GPT-4 (Open AI, 2023)" but does not provide specific version numbers for these or other key software libraries like PyTorch, numpy, or scikit-learn that would be necessary for reproduction.
Experiment Setup Yes The model was trained using an SGD optimizer with a learning rate of 10 4, a weight decay of 10 3, and a momentum of 0.9, over 200 epochs.