Language-Guided Hybrid Representation Learning for Visual Grounding on Remote Sensing Images

Authors: Biao Liu, Xu Liu, Lingling Li, Licheng Jiao, Fang Liu, Xinyu Sun, Youlin Huang

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments 4.1 Datasets ... 4.4 Compared with State-of-the-Art Method ... 4.6 Ablation Study ... Table 1: Compared with state-of-the-art methods on the DIOR-RSVG test set ... Table 2: Compared with state-of-the-art methods on the OPT-RSVG test set
Researcher Affiliation Academia Biao Liu1 , Xu Liu1 , Lingling Li1 , Licheng Jiao1 , Fang Liu1 , Xinyu Sun1 and Youlin Huang2 1Xidian University 2East China Jiaotong University
Pseudocode No The paper describes the proposed method using block diagrams (Figure 1, 2, 3) and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes 4.1 Datasets 1) DIOR-RSVG: The DIOR-RSVG dataset [Zhan et al., 2023] was constructed based on DIOR [Li et al., 2020]. ... 2) OPT-RSVG: OPT-RSVG [Li et al., 2024] includes a wider range of target scales...
Dataset Splits Yes In the released version of DIOR-RSVG we used, the split ratios for the training set, validation set, and test set image-text pairs are 70%, 10%, and 20%, respectively. ... The official division ratios for the training set, validation set, and test set provided by the OPT-RSVG dataset are 40%, 10%, and 50%, respectively.
Hardware Specification Yes During the training process, we conducted distributed training on four RTX 3090 GPUs (24 GB VRAM).
Software Dependencies No The LGFormer we proposed is implemented using Pytorch, just like the other deep learning models we compared it with. ... we use the pre-trained BERT-base as the text encoder. No specific version numbers for Pytorch or other libraries are provided.
Experiment Setup Yes In all training sessions, we set the batch size to 2 per GPU and used Adam W as the optimizer. The initial learning rate for the text encoder BERT is 1e-5, and the rest are 1e-4. On the DIOR-RSVG dataset, we conducted a total of 40 training epochs. ... For the OPT-RSVG dataset, we fine-tuned the model trained on the DIOR-RSVG training set for 15 epochs.