PatternCIR Benchmark and TisCIR: Advancing Zero-Shot Composed Image Retrieval in Remote Sensing
Authors: Zhechun Liang, Tao Huang, Fangfang Wu, Shiwen Xue, Zhenyu Wang, Weisheng Dong, Xin Li, Guangming Shi
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, our Tis CIR demonstrated excellent performance in RSCIR, surpassing ZS-CIR methods applied to natural images, achieving state-of-the-art performance. Additionally, we validated the effect of the FGIA module by visualizing the image embedding attention. We confirmed that the module can effectively remove the information that needs to be replaced while retaining the fine-grained information in the rest of the image. |
| Researcher Affiliation | Academia | Zhechun Liang1 , Tao Huang1 , Fangfang Wu1 , Shiwen Xue1 , Zhenyu Wang1 , Weisheng Dong1,3 , Xin Li2 and Guangming Shi1,3 1Xidian University 2State University of New York at Albany 3Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education |
| Pseudocode | No | The paper describes the methodology for ZS-QTG and Tis CIR in detail, including mathematical formulations and workflow diagrams (Fig. 2, 3, 4, 5). However, it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The data and code are available here. |
| Open Datasets | Yes | We propose a model for generating text query, called the Zero-Shot Query Text Generator (ZS-QTG). Using ZSQTG, we constructed Pattern CIR, the first fine-grained composed image retrieval benchmark in RSCIR. |
| Dataset Splits | Yes | Finally, we have obtained a total of 17700 triplets, which are divided into test, validation, and training sets in a ratio of 1.5:1.5:7. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using AdamW optimizer and refers to models like CLIP and Remote CLIP, but does not provide specific version numbers for software libraries, programming languages (e.g., Python, PyTorch), or other ancillary software components. |
| Experiment Setup | Yes | We use the Adam W optimizer [Loshchilov and Hutter, 2017] with a fixed learning rate of 0.0001, a weight decay of 0.01, and a batch size of 256 for the first stage of training, whereas a batch size of 32 is used for the second stage. Dropout with a probability of 50% is applied for regularization. |