Game4Loc: A UAV Geo-Localization Benchmark from Game Data
Authors: Yuxiang Ji, Boyong He, Zhuoyue Tan, Liaoni Wu
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the effectiveness of our data and training method for UAV geo-localization, as well as the generalization capabilities to real-world scenarios. For our GTA-UAV dataset, we compare the proposed method with previous SOTA training methods under both cross-area and same-area settings using positive + semi-positive and positive-only as training data respectively. As results in Tab. 2, in the proposed paritial matching settings, our proposed weighted-Info NCE achieves the best results across all metrics. |
| Researcher Affiliation | Academia | Yuxiang Ji1*, Boyong He1*, Zhuoyue Tan1, Liaoni Wu1,2 1Institute of Artifcial Intelligence, Xiamen University 2School of Aerospace Engineering, Xiamen University |
| Pseudocode | Yes | Algorithm 1: Mutually Exclusive Sampling process |
| Open Source Code | No | Project Page https://yux1angji.github.io/game4loc/. The paper mentions a project page, which is a high-level overview, but does not provide a direct link to a code repository or explicitly state that the source code for the methodology is released. |
| Open Datasets | No | In this work, we construct a large-range contiguous area UAV geo-localization dataset named GTA-UAV, featuring multiple flight altitudes, attitudes, scenes, and targets using modern computer games. Based on this dataset, we introduce a more practical UAV geo-localization task including partial matches of cross-view paired data, and expand the image-level retrieval to the actual localization in terms of distance (meters). Project Page https://yux1angji.github.io/game4loc/. The paper describes the construction of the GTA-UAV dataset and mentions a project page, but does not provide a direct URL, DOI, or specific repository name for accessing the dataset itself within the paper. |
| Dataset Splits | Yes | Based on this, we introduce two application scenarios as the same in VIGOR (Zhu, Yang, and Chen 2021): same area and cross area. The same area represents the scenario where both the training and the testing data pairs are sampled from the same area, reflecting applications where the flight area data is available. The cross area represents the case that the training and testing data are seperated. Under this setting, we divide half of the game map into training data and evaluate on the other half, and these areas differ on the scenes. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running the experiments. |
| Software Dependencies | No | In our exeperiments the Vi T-Base (Dosovitskiy et al. 2021) with patch-size 16 16 and 64M parameters is adopted as the image encoding architecture. ... We follow the training approach using Symmetric Info NCE from Sample4Geo (Deuser, Habel, and Oswald 2023) as the baseline, leveraging all available negatives in batch learning. ... we employ Adam optimizer (Kingma and Ba 2017) with a initial learning rate of 0.0001 and a cosine learning rate scheduler to train each experiment for 20 epochs in batch size of 64. The paper mentions models, optimizers, and schedulers but does not specify software versions for programming languages or libraries. |
| Experiment Setup | Yes | In our exeperiments the Vi T-Base (Dosovitskiy et al. 2021) with patch-size 16 16 and 64M parameters is adopted as the image encoding architecture. Both drone-view images and satellite-view images are resized to 384 384 before feeding into the network. The hyper-parameter k of weighted-Info NCE is set to 5 as default, and the learnable temperature parameter τ is initialized to 1. Following Sample4Geo (Deuser, Habel, and Oswald 2023), we employ Adam optimizer (Kingma and Ba 2017) with a initial learning rate of 0.0001 and a cosine learning rate scheduler to train each experiment for 20 epochs in batch size of 64. The flipping, rotation, and grid dropout are included as data augmentation for training. |