Understanding Matters: Semantic-Structural Determined Visual Relocalization for Large Scenes
Authors: Jingyi Nie, Liangliang Cai, Qichuan Geng, Zhong Zhou
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the Cambridge Landmarks dataset demonstrate that the proposed method achieves significant improvements with fewer training costs on large-scale scenes, reducing the median error by 38% compared to the state-of-the-art SCR method DSAC*. Code is available: https://gitee.com/VR NAVE/ss-dvr. Figure 1: Quantitative comparison of position error and mapping time. We evaluate the state-of-the-art SCR and feature matching methods on the large-scale outdoor dataset Cambridge Landmarks. The position error and training time of these methods are compared. Table 1: 7Scenes and 12Scenes Results. We report the percentage of frames below a 5cm, 5 pose error. Table 2: Cambridge Landmarks Results. We report median rotation and position errors. 4.4 Ablation Study We also conduct ablation studies on the main design choices, modules, and training strategies of our approach on the Cambridge Landmarks dataset. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China 2The Information Engineering College, Capital Normal University, Beijing, China 3Zhongguancun Laboratory, Beijing, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using prose and mathematical equations. There are no explicit sections or figures labeled "Pseudocode" or "Algorithm", nor are there any structured, code-like blocks detailing a procedure step-by-step. |
| Open Source Code | Yes | Code is available: https://gitee.com/VR NAVE/ss-dvr. |
| Open Datasets | Yes | We conduct our experiments on 7Scenes [Shotton et al., 2013], 12Scenes [Valentin et al., 2016], Cambridge Landmarks [Kendall et al., 2015] and Wayspots [Brachmann et al., 2023]. |
| Dataset Splits | No | The paper mentions several datasets (7Scenes, 12Scenes, Cambridge Landmarks, Wayspots) and describes their content. However, it does not provide specific details on how these datasets were split into training, validation, or test sets for the experiments (e.g., exact percentages or sample counts for each split), nor does it explicitly reference a standard split with a citation or provide files for custom splits. |
| Hardware Specification | Yes | We compare mapping times of ACE, GLACE, DSAC* and ours on NVIDIA GeForce RTX 2080 Ti. |
| Software Dependencies | No | We implement our method in Py Torch, using the backbone of ACE [Brachmann et al., 2023] as the feature extractor. We implement mini-batch k-means with CUDA. The paper mentions PyTorch and CUDA but does not specify their version numbers, which are required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | The batch size is set to 400K, and the maximum number of iterations is 200. We cluster 64 categories at the structural level and 2 classes at the semantic level. To better adapt to the scale of the scene, we define α in Equation 9 as 20, which yields relatively good results for both large and small scenes. Additionally, we set β in Equation 15 to 0.1. γ is defined as 100 to encourage the network to focus on label prediction accuracy. |