GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Authors: Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan HU, Ming Cheng, Zirui Wang, Victor Prisacariu, Tristan Braud

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive quantitative evaluations and ablation studies on the 7Scenes (Glocker et al., 2013; Shotton et al., 2013), 12Scenes (Valentin et al., 2016), and Cambridge Landmarks (Kendall et al., 2015) benchmarks. GS-CPR significantly enhances the pose estimation accuracy of both APR and SCR methods across these benchmarks, achieving new state-of-the-art accuracy on the two indoor datasets.
Researcher Affiliation Academia Changkun Liu1 Shuai Chen2 Yash Bhalgat2 Siyan Hu1 Ming Cheng3 Zirui Wang2 Victor Adrian Prisacariu2 Tristan Braud1 1HKUST 2University of Oxford 3Dartmouth College EMAIL
Pseudocode No The paper describes the methodology through textual explanations and mathematical equations, but it does not contain any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes The project page is available at: https://xrim-lab.github.io/GS-CPR/.
Open Datasets Yes We evaluate the performance of GS-CPR across three widely used public visual localization datasets. The 7Scenes dataset (Glocker et al., 2013; Shotton et al., 2013) comprises seven indoor scenes with volumes ranging from 1 18 m3. The 12Scenes dataset (Valentin et al., 2016) features 12 larger indoor scenes, with volumes spanning from 14 79 m3. The Cambridge Landmarks dataset (Kendall et al., 2015) represents large-scale outdoor scenarios, characterized by challenges such as moving objects and varying lighting conditions between query and training images.
Dataset Splits Yes Evaluation Metrics. We report two types of metrics to compare the performance of different methods. The first metric is the median translation and rotation error. The second metric is the recall rate, which measures the percentage of test images localized within a cm and b . For both the 7Scenes and 12Scenes datasets, we adopt the Sf M ground truth (GT) provided by Brachmann et al. (2021).
Hardware Specification Yes We evaluate the runtime of the proposed framework using an NVIDIA GeForce RTX 4090 GPU. The modified Scaffold-GS model is trained for each scene with 30,000 iterations on an NVIDIA A6000 GPU.
Software Dependencies No The paper mentions software like PyTorch (Paszke et al., 2019), MASt3R (Leroy et al., 2024), and Scaffold-GS (Lu et al., 2024), but it does not provide specific version numbers for these software components, which are required for full reproducibility.
Experiment Setup Yes The modified Scaffold-GS model is trained for each scene with 30,000 iterations on an NVIDIA A6000 GPU. For the exposure-adaptive ACT module, we follow the default setting in Chen et al. (2024a), computing the query image s histogram in the YUV color space and binning the luminance channel into 10 bins. We employ the official pre-trained MASt3R (Leroy et al., 2024) model without fine-tuning for 2D-2D matching and resize all images to 512 pixels on their largest dimension.