Enhancing Visual Localization with Cross-Domain Image Generation
Authors: Yuanze Wang, Yichao Yan, Shiming Song, Songchang Jin, Yilan Huang, Xingdong Sheng, Dianxi Shi
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method achieves state-of-the-art accuracy, outperforming existing cross-domain visual localization methods by an average of 59% across all domains. Project page: https: //yzwang-sjtu.github.io/CDG-Loc. ... Section 5. Experiments |
| Researcher Affiliation | Collaboration | 1Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 2Intelligent Game and Decision Lab (IGDL), Beijing, China 3Lenovo Research, Shanghai, China 4Department of Big Data Intelligence, Advanced Institute of Big Data, Beijing, 100195, China. |
| Pseudocode | No | The paper describes the methodology using prose and mathematical formulas, without explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https: //yzwang-sjtu.github.io/CDG-Loc. |
| Open Datasets | Yes | We conduct experiments using the large-scale dataset 360Loc (Huang et al., 2024), which includes dynamic objects and significant variations in lighting conditions. |
| Dataset Splits | Yes | We utilize even-indexed images for training the cross-domain 3DGS, while using odd-indexed images for evaluation purposes. |
| Hardware Specification | Yes | Training and evaluation are performed on an NVIDIA Ge Force GTX 4090 GPU. |
| Software Dependencies | No | Our method is built upon the widely-used open-source Scaffold-GS (Lu et al., 2024) codebase and employs the MS-T (Shavit et al., 2021) for visual localization. No specific version numbers for these software components are provided. |
| Experiment Setup | Yes | During cross-domain 3DGS training, we scale images to 400 400 and train for 60,000 iterations in the first stage, followed by fine-tuning for 20,000 iterations in the second stage. For the editing model, we fine-tune it for 10,000 iterations at a resolution of 512 512. The number L of photometric embeddings is set to half the number of the mapping dataset and the parameter m is set to 24. During visual localization training, images from all domains are resized to 256 256. ... We set an initial learning rate of λ = 10 4 and a batch size of 32 for 300 epochs for both our method and the baselines. |