CABIN: Debiasing Vision-Language Models Using Backdoor Adjustments
Authors: Bo Pang, Tingrui Qiao, Caroline Walker, Chris Cunningham, Yun Sing Koh
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments and analyses, we demonstrate that CABIN effectively mitigates biases and improves fairness metrics while preserving the zeroshot strengths of VLMs. The code is available at: https://github.com/ipangbo/causal-debias |
| Researcher Affiliation | Academia | 1School of Computer Science, University of Auckland, Auckland, New Zealand 2The Liggins Institute, University of Auckland, Auckland, New Zealand 3Research Centre for M aori Health and Development, Massey University, Wellington, New Zealand EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual descriptions, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at: https://github.com/ipangbo/causal-debias |
| Open Datasets | Yes | Evaluation Datasets. We use FACET [Gustafson et al., 2023], PATA [Seth et al., 2023], and Flickr30K [Plummer et al., 2015] to evaluate our debiasing method. Traditional facecentric datasets such as Fair Face [Karkkainen and Joo, 2021], MS-COCO (MS) [Lin et al., 2014], and Pascal-Sentence (PS) [Rashtchian et al., 2010] are also used to show our method applies to various ranges of datasets and tasks. |
| Dataset Splits | No | The paper mentions using 'test data Dtest' for attribute distribution estimation and evaluating on several datasets, but it does not specify concrete training/validation/test split percentages, sample counts, or refer to specific standard splits used for these evaluation datasets (FACET, PATA, Flickr30K, Fair Face, MS-COCO, Pascal-Sentence). It only states for the mapper training that 'We randomly sampled 10 million paired image-text data from the dataset'. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. It only generally refers to 'computational resources'. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiment. |
| Experiment Setup | Yes | To obtain high-confidence results for the model, we set ϵ to 0.5. ... The weighting factor λ balances the alignment loss Lalign and the contrastive difference loss Ldiff... We evaluate three settings (λ = 0, λ = 0.5, and λ = 1). |