CABIN: Debiasing Vision-Language Models Using Backdoor Adjustments

Authors: Bo Pang, Tingrui Qiao, Caroline Walker, Chris Cunningham, Yun Sing Koh

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments and analyses, we demonstrate that CABIN effectively mitigates biases and improves fairness metrics while preserving the zeroshot strengths of VLMs. The code is available at: https://github.com/ipangbo/causal-debias
Researcher Affiliation Academia 1School of Computer Science, University of Auckland, Auckland, New Zealand 2The Liggins Institute, University of Auckland, Auckland, New Zealand 3Research Centre for M aori Health and Development, Massey University, Wellington, New Zealand EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using mathematical formulations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at: https://github.com/ipangbo/causal-debias
Open Datasets Yes Evaluation Datasets. We use FACET [Gustafson et al., 2023], PATA [Seth et al., 2023], and Flickr30K [Plummer et al., 2015] to evaluate our debiasing method. Traditional facecentric datasets such as Fair Face [Karkkainen and Joo, 2021], MS-COCO (MS) [Lin et al., 2014], and Pascal-Sentence (PS) [Rashtchian et al., 2010] are also used to show our method applies to various ranges of datasets and tasks.
Dataset Splits No The paper mentions using 'test data Dtest' for attribute distribution estimation and evaluating on several datasets, but it does not specify concrete training/validation/test split percentages, sample counts, or refer to specific standard splits used for these evaluation datasets (FACET, PATA, Flickr30K, Fair Face, MS-COCO, Pascal-Sentence). It only states for the mapper training that 'We randomly sampled 10 million paired image-text data from the dataset'.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. It only generally refers to 'computational resources'.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiment.
Experiment Setup Yes To obtain high-confidence results for the model, we set ϵ to 0.5. ... The weighting factor λ balances the alignment loss Lalign and the contrastive difference loss Ldiff... We evaluate three settings (λ = 0, λ = 0.5, and λ = 1).