GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting
Authors: Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan HU, Ming Cheng, Zirui Wang, Victor Prisacariu, Tristan Braud
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive quantitative evaluations and ablation studies on the 7Scenes (Glocker et al., 2013; Shotton et al., 2013), 12Scenes (Valentin et al., 2016), and Cambridge Landmarks (Kendall et al., 2015) benchmarks. GS-CPR significantly enhances the pose estimation accuracy of both APR and SCR methods across these benchmarks, achieving new state-of-the-art accuracy on the two indoor datasets. |
| Researcher Affiliation | Academia | Changkun Liu1 Shuai Chen2 Yash Bhalgat2 Siyan Hu1 Ming Cheng3 Zirui Wang2 Victor Adrian Prisacariu2 Tristan Braud1 1HKUST 2University of Oxford 3Dartmouth College EMAIL |
| Pseudocode | No | The paper describes the methodology through textual explanations and mathematical equations, but it does not contain any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | The project page is available at: https://xrim-lab.github.io/GS-CPR/. |
| Open Datasets | Yes | We evaluate the performance of GS-CPR across three widely used public visual localization datasets. The 7Scenes dataset (Glocker et al., 2013; Shotton et al., 2013) comprises seven indoor scenes with volumes ranging from 1 18 m3. The 12Scenes dataset (Valentin et al., 2016) features 12 larger indoor scenes, with volumes spanning from 14 79 m3. The Cambridge Landmarks dataset (Kendall et al., 2015) represents large-scale outdoor scenarios, characterized by challenges such as moving objects and varying lighting conditions between query and training images. |
| Dataset Splits | Yes | Evaluation Metrics. We report two types of metrics to compare the performance of different methods. The first metric is the median translation and rotation error. The second metric is the recall rate, which measures the percentage of test images localized within a cm and b . For both the 7Scenes and 12Scenes datasets, we adopt the Sf M ground truth (GT) provided by Brachmann et al. (2021). |
| Hardware Specification | Yes | We evaluate the runtime of the proposed framework using an NVIDIA GeForce RTX 4090 GPU. The modified Scaffold-GS model is trained for each scene with 30,000 iterations on an NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions software like PyTorch (Paszke et al., 2019), MASt3R (Leroy et al., 2024), and Scaffold-GS (Lu et al., 2024), but it does not provide specific version numbers for these software components, which are required for full reproducibility. |
| Experiment Setup | Yes | The modified Scaffold-GS model is trained for each scene with 30,000 iterations on an NVIDIA A6000 GPU. For the exposure-adaptive ACT module, we follow the default setting in Chen et al. (2024a), computing the query image s histogram in the YUV color space and binning the luminance channel into 10 bins. We employ the official pre-trained MASt3R (Leroy et al., 2024) model without fine-tuning for 2D-2D matching and resize all images to 512 pixels on their largest dimension. |