Language-Guided Hybrid Representation Learning for Visual Grounding on Remote Sensing Images
Authors: Biao Liu, Xu Liu, Lingling Li, Licheng Jiao, Fang Liu, Xinyu Sun, Youlin Huang
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments 4.1 Datasets ... 4.4 Compared with State-of-the-Art Method ... 4.6 Ablation Study ... Table 1: Compared with state-of-the-art methods on the DIOR-RSVG test set ... Table 2: Compared with state-of-the-art methods on the OPT-RSVG test set |
| Researcher Affiliation | Academia | Biao Liu1 , Xu Liu1 , Lingling Li1 , Licheng Jiao1 , Fang Liu1 , Xinyu Sun1 and Youlin Huang2 1Xidian University 2East China Jiaotong University |
| Pseudocode | No | The paper describes the proposed method using block diagrams (Figure 1, 2, 3) and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | 4.1 Datasets 1) DIOR-RSVG: The DIOR-RSVG dataset [Zhan et al., 2023] was constructed based on DIOR [Li et al., 2020]. ... 2) OPT-RSVG: OPT-RSVG [Li et al., 2024] includes a wider range of target scales... |
| Dataset Splits | Yes | In the released version of DIOR-RSVG we used, the split ratios for the training set, validation set, and test set image-text pairs are 70%, 10%, and 20%, respectively. ... The official division ratios for the training set, validation set, and test set provided by the OPT-RSVG dataset are 40%, 10%, and 50%, respectively. |
| Hardware Specification | Yes | During the training process, we conducted distributed training on four RTX 3090 GPUs (24 GB VRAM). |
| Software Dependencies | No | The LGFormer we proposed is implemented using Pytorch, just like the other deep learning models we compared it with. ... we use the pre-trained BERT-base as the text encoder. No specific version numbers for Pytorch or other libraries are provided. |
| Experiment Setup | Yes | In all training sessions, we set the batch size to 2 per GPU and used Adam W as the optimizer. The initial learning rate for the text encoder BERT is 1e-5, and the rest are 1e-4. On the DIOR-RSVG dataset, we conducted a total of 40 training epochs. ... For the OPT-RSVG dataset, we fine-tuned the model trained on the DIOR-RSVG training set for 15 epochs. |