PatternCIR Benchmark and TisCIR: Advancing Zero-Shot Composed Image Retrieval in Remote Sensing

Authors: Zhechun Liang, Tao Huang, Fangfang Wu, Shiwen Xue, Zhenyu Wang, Weisheng Dong, Xin Li, Guangming Shi

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, our Tis CIR demonstrated excellent performance in RSCIR, surpassing ZS-CIR methods applied to natural images, achieving state-of-the-art performance. Additionally, we validated the effect of the FGIA module by visualizing the image embedding attention. We confirmed that the module can effectively remove the information that needs to be replaced while retaining the fine-grained information in the rest of the image.
Researcher Affiliation Academia Zhechun Liang1 , Tao Huang1 , Fangfang Wu1 , Shiwen Xue1 , Zhenyu Wang1 , Weisheng Dong1,3 , Xin Li2 and Guangming Shi1,3 1Xidian University 2State University of New York at Albany 3Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education
Pseudocode No The paper describes the methodology for ZS-QTG and Tis CIR in detail, including mathematical formulations and workflow diagrams (Fig. 2, 3, 4, 5). However, it does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes The data and code are available here.
Open Datasets Yes We propose a model for generating text query, called the Zero-Shot Query Text Generator (ZS-QTG). Using ZSQTG, we constructed Pattern CIR, the first fine-grained composed image retrieval benchmark in RSCIR.
Dataset Splits Yes Finally, we have obtained a total of 17700 triplets, which are divided into test, validation, and training sets in a ratio of 1.5:1.5:7.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies No The paper mentions using AdamW optimizer and refers to models like CLIP and Remote CLIP, but does not provide specific version numbers for software libraries, programming languages (e.g., Python, PyTorch), or other ancillary software components.
Experiment Setup Yes We use the Adam W optimizer [Loshchilov and Hutter, 2017] with a fixed learning rate of 0.0001, a weight decay of 0.01, and a batch size of 256 for the first stage of training, whereas a batch size of 32 is used for the second stage. Dropout with a probability of 50% is applied for regularization.