VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering

Authors: Chun-Mei Feng, Yang Bai, Tao Luo, Zhen Li, Salman Khan, Wangmeng Zuo, Rick Siow Mong Goh, Yong Liu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets. ... Extensive experiments are conducted on the CIRR and Fashion-IQ datasets. The results show that our VQA4CIR can be incorporated with different CIR methods and outperforms the state-of-the-art CIR methods. To sum up, the contributions of this work are three-fold: ... Experimental results show that our VQA4CIR outperforms the state-of-the-art CIR methods and can be directly plugged into existing CIR methods.
Researcher Affiliation Academia 1Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore 2SSE, The Chinese University of Hong Kong, Shenzhen (CUHK), China 3Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE 4Australian National University, Canberra ACT, Australia 5Harbin Institute of Technology, Harbin, China
Pseudocode No The paper describes the methodology using textual explanations and illustrative figures (Figure 2, 3, 4) but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Code https://github.com/chunmeifeng/VQA4CIR
Open Datasets Yes Experimental results show that our proposed method outperforms state-of-the-art CIR methods on the CIRR and Fashion-IQ datasets. ... Extensive experiments are conducted on the CIRR and Fashion-IQ datasets. The detailed setups follow previous works (Suhr et al. 2018; Wu et al. 2021).
Dataset Splits Yes During training, we randomly adopt 5, 000 and 3, 000 samples from the CIRR dataset and Fashion IQ training data, respectively, to fine-tune LLa MA (Touvron et al. 2023) and LLa VA (Liu et al. 2023a). ... We evaluate our method on two CIR benchmarks, i.e., CIRR (Suhr et al. 2018) and Fashion-IQ (Wu et al. 2021). The detailed setups follow previous works (Suhr et al. 2018; Wu et al. 2021).
Hardware Specification Yes Our VQA4CIR is implemented with Pytorch on NVIDIA RTX A100 GPUs with 40GB of memory per card.
Software Dependencies Yes Our VQA4CIR is implemented with Pytorch on NVIDIA RTX A100 GPUs with 40GB of memory per card. To preserve the generalization ability of the pre-trained models, i.e., LLa MA (Touvron et al. 2023) and LLa VA (Liu et al. 2023a), we leverage Lo RA (Hu et al. 2021) to fine-tune them while keeping the backbones frozen, i.e., LLa VA-v1.5-13B and Vicuna-13B-v1.5.
Experiment Setup Yes The Adam W (Loshchilov and Hutter 2017) is adopted as the optimizer with a weight decay of 0.05 across all the experiments. We adopt Warmup Decay LR as the learning rate scheduler with warmup iterations of 1, 000. For LLa VA (Liu et al. 2023a), the learning rate is initialized at 2e-5, while for LLa MA (Touvron et al. 2023), it is initialized at 3e-4. The hyperparameter of α is respectively set to 20 and 30 on the CIRR and Fashion-IQ datasets, while β is empirically set to 10 and 12.