Understanding Matters: Semantic-Structural Determined Visual Relocalization for Large Scenes

Authors: Jingyi Nie, Liangliang Cai, Qichuan Geng, Zhong Zhou

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Cambridge Landmarks dataset demonstrate that the proposed method achieves significant improvements with fewer training costs on large-scale scenes, reducing the median error by 38% compared to the state-of-the-art SCR method DSAC*. Code is available: https://gitee.com/VR NAVE/ss-dvr. Figure 1: Quantitative comparison of position error and mapping time. We evaluate the state-of-the-art SCR and feature matching methods on the large-scale outdoor dataset Cambridge Landmarks. The position error and training time of these methods are compared. Table 1: 7Scenes and 12Scenes Results. We report the percentage of frames below a 5cm, 5 pose error. Table 2: Cambridge Landmarks Results. We report median rotation and position errors. 4.4 Ablation Study We also conduct ablation studies on the main design choices, modules, and training strategies of our approach on the Cambridge Landmarks dataset.
Researcher Affiliation Academia 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China 2The Information Engineering College, Capital Normal University, Beijing, China 3Zhongguancun Laboratory, Beijing, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods using prose and mathematical equations. There are no explicit sections or figures labeled "Pseudocode" or "Algorithm", nor are there any structured, code-like blocks detailing a procedure step-by-step.
Open Source Code Yes Code is available: https://gitee.com/VR NAVE/ss-dvr.
Open Datasets Yes We conduct our experiments on 7Scenes [Shotton et al., 2013], 12Scenes [Valentin et al., 2016], Cambridge Landmarks [Kendall et al., 2015] and Wayspots [Brachmann et al., 2023].
Dataset Splits No The paper mentions several datasets (7Scenes, 12Scenes, Cambridge Landmarks, Wayspots) and describes their content. However, it does not provide specific details on how these datasets were split into training, validation, or test sets for the experiments (e.g., exact percentages or sample counts for each split), nor does it explicitly reference a standard split with a citation or provide files for custom splits.
Hardware Specification Yes We compare mapping times of ACE, GLACE, DSAC* and ours on NVIDIA GeForce RTX 2080 Ti.
Software Dependencies No We implement our method in Py Torch, using the backbone of ACE [Brachmann et al., 2023] as the feature extractor. We implement mini-batch k-means with CUDA. The paper mentions PyTorch and CUDA but does not specify their version numbers, which are required for a reproducible description of ancillary software.
Experiment Setup Yes The batch size is set to 400K, and the maximum number of iterations is 200. We cluster 64 categories at the structural level and 2 classes at the semantic level. To better adapt to the scale of the scene, we define α in Equation 9 as 20, which yields relatively good results for both large and small scenes. Additionally, we set β in Equation 15 to 0.1. γ is defined as 100 to encourage the network to focus on label prediction accuracy.