Semantic Ambiguity Modeling and Propagation for Fine-Grained Visual Cross View Geo-Localization

Authors: Mingtao Feng, Fenghao Tian, Jianqiao Luo, Zijie Wu, Weisheng Dong, Yaonan Wang, Ajmal Saeed Mian

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method improves the overall performance of the joint tasks, achieving state-of-the-art results on the VIGOR and CVACT datasets. Ablation Studies Semantic Ambiguity Modeling. To demonstrate that our framework can properly model the semantic ambiguity of samples, we visualize the predicted localization, uncertainty score, and IOU (between the query and reference images) for each sample.
Researcher Affiliation Academia 1Xidian University 2Hunan University 3University of Western Australia
Pseudocode No The paper describes the methodology using mathematical equations and textual descriptions of the steps involved, but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/Afoolbird/SAMP
Open Datasets Yes Extensive experiments demonstrate that our method improves the overall performance of the joint tasks, achieving state-of-the-art results on the VIGOR and CVACT datasets. We use two benchmark datasets. VIGOR: ... CVACT: Following the sampling strategy for practical scenarios in VIGOR (Zhu et al. 2021a), we crop the CVACT (Liu and Li 2019) dataset randomly, and define the positive and semi-positive samples.
Dataset Splits Yes We follow the same-area and cross-area splits from (Zhu et al. 2021a). In a training data unit, each query image corresponds to one positive reference image and one randomly selected from the three sampled semi-positive samples. To simulate practical scenarios, we resample the reference image into five patches (one central and four corners) each covering the query location. The central patch is treated as a positive reference, while the corner ones are considered as semi-positive, similar to the VIGOR (Zhu et al. 2021a) setting.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or other computer specifications used for running the experiments.
Software Dependencies No Following VIGOR, the VGG16 (Simonyan and Zisserman 2014) is adopted as the backbone feature extractor and 8 SAFA blocks (Shi et al. 2019) are used. The paper mentions architectural components but does not provide specific software library versions (e.g., PyTorch version, TensorFlow version, CUDA version) or other system-level software dependencies with version numbers.
Experiment Setup No Following VIGOR, the VGG16 (Simonyan and Zisserman 2014) is adopted as the backbone feature extractor and 8 SAFA blocks (Shi et al. 2019) are used. In a training data unit, each query image corresponds to one positive reference image and one randomly selected from the three sampled semi-positive samples. The text mentions the architecture, the use of a temperature parameter τ, and positive parameters α, β, and γ for the decaying function δ(ε) but lacks other specific hyperparameters such as learning rate, batch size, or optimizer details in the main text. It states 'More details are in the supplementary material.'.