Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Authors: Xiangyu Wang, Donglin Yang, ziqin wang, Hohin Kwan, Jinyu Chen, wenjun wu, Hongsheng Li, Yue Liao, Si Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The evaluation results of our method significantly outperform the baseline models, while there remains a considerable gap between our results and those achieved by human operators, underscoring the challenge presented by the UAV-Need-Help task. The paper includes a dedicated section '7 EXPERIMENTS' with subsections like '7.2 QUANTITATIVE RESULT', and presents performance tables such as 'Table 2: Results on Test Seen Set across different assistant levels', all of which are characteristic of experimental research. |
| Researcher Affiliation | Academia | All listed affiliations, including 'Institute of Artificial Intelligence, Beihang University', 'Hangzhou International Innovation Institute of Beihang University', 'MMLab, CUHK', and 'Centre for Perceptual and Interactive Intelligence', are academic institutions. The email addresses also align with academic affiliations. |
| Pseudocode | No | The paper describes methods through text and block diagrams (e.g., Figure 3 (b) for UAV Navigation LLM framework) but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' sections, nor are there any structured, code-like procedural descriptions. |
| Open Source Code | Yes | The abstract states, 'The project homepage can be accessed at https://prince687028.github.io/Travel.' Additionally, Section 3, 'TRAVEL SIMULATION PLATFORM', explicitly states, 'As illustrated in Fig. 1, the TRAVEL simulation platform is a fully open-source platform devoted to realistic UAV VLN tasks, integrating three modules: environment composition, flight simulation, and algorithmic support to achieve comprehensive functionality.' |
| Open Datasets | No | The paper describes the construction of a new dataset: 'We further construct a target-oriented VLN dataset consisting of approximately 12k trajectories on this platform, serving as the first dataset specifically designed for realistic UAV VLN tasks.' However, it does not provide a direct link, DOI, or repository for accessing this dataset, nor does it cite a published resource with access information. |
| Dataset Splits | Yes | Section 4.2, 'DATA ANALYSIS', under 'Dataset Split', specifies the exact distribution: 'Train 9152 trajectories with 76 objects across 20 scenes as the training set. Test Seen 1410 trajectories generated using objects and scenes seen in the training set. Test Unseen Map 958 trajectories with 2 scenes unseen in the training set. Test Unseen Object 629 trajectories with 13 objects unseen in the training set.' |
| Hardware Specification | Yes | Section 3.3, 'ALGORITHMIC SUPPORT', mentions, 'With 8 NVIDIA A100 GPUs, the simulation of a single UAV achieves a performance boost of 16 times...' Furthermore, Section C, 'EXPERIMENTAL DETAILS', reiterates, 'The MLLM model is trained on 8 NVIDIA A100 GPUs with a batch size of 128 for 2 epochs...' |
| Software Dependencies | Yes | The paper states: 'We utilize UE4 s realistic rendering capabilities... We integrate Air Sim plugin to translate trajectory sequences into continuous paths with realistic flight dynamics'. The mention of 'UE4' (Unreal Engine 4) provides a specific major version number for a key software component. |
| Experiment Setup | Yes | Section C, 'EXPERIMENTAL DETAILS', provides extensive details: 'During the training of the MLLM, we freeze most of the model s parameters and only compute gradients on the visual projector, trajectory prediction head, and Lo RA layers... We supervise the predicted 3D angles using cosine similarity loss and apply L1 loss between the predicted waypoints and the ground truth... The MLLM model is trained on 8 NVIDIA A100 GPUs with a batch size of 128 for 2 epochs, while the fine-grained model and CMA model are trained with a batch size of 128 for 10 epochs. We use Adam optimizer with a one-cycle learning rate decay schedule to train all models, where we set the maximum learning rate to 5e-4.' |