ConDo: Continual Domain Expansion for Absolute Pose Regression
Authors: Zijun Li, Zhipeng Cai, Bochun Yang, Xuelun Shen, Siqi Shen, Xiaoliang Fan, Michael Paulitsch, Cheng Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Large-scale benchmarks with various scene types are constructed to evaluate models under practical (long-term) data changes. Con Do consistently and significantly outperforms baselines across architectures, scene types, and data changes. ...Experiments validate the effectiveness of Con Do on different baseline architectures and data with both scene and pose changes. |
| Researcher Affiliation | Collaboration | 1Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China 2Intel Labs {lizijun;yangbc;xuelun}@stu.xmu.edu.cn,{zhipeng.cai;michael.paulitsch}@intel.com, {siqishen;fanxiaoliang;cwang}@xmu.edu.cn |
| Pseudocode | No | The paper describes the methodology in prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Zijun Li7/Con Do |
| Open Datasets | Yes | To simulate novel poses and the sequentially revealed multiple scenes, we adopt standard APR datasets, namely, 7Scenes and Cambridge (Glocker et al. 2013; Kendall, Grimes, and Cipolla 2015). ...we utilize large-scale driving datasets with both significant lighting changes (daytime to night time) and long-term scene changes (spring to winter). Specifically, we take the Office Loop and Neighborhood, which are two large-scale scenes in 4Seasons (Wenzel et al. 2021) |
| Dataset Splits | Yes | To simulate the practical scenario, we split multiple scans of the same scene into training and inference and reveal the inference scans sequentially, i.e., every round of Con Do model update starts when a new inference scan is revealed. We randomly hold out 1/8 images in each scan (training and inference) and use them to evaluate the generalization of APR on the corresponding scan. To create challenging evaluation data, instead of holding out individual images uniformly distributed in each scan, we hold several sets of images where each set is a continuous trajectory of the scan consisting of 16 images (see Fig. 3). The held-out evaluation data allow us to fully evaluate APR on images unseen both during normal training and Con Do. To simulate novel poses and the sequentially revealed multiple scenes, we adopt standard APR datasets, namely, 7Scenes and Cambridge (Glocker et al. 2013; Kendall, Grimes, and Cipolla 2015). These two datasets represent the case of indoor and outdoor scenes respectively and different scans of the same scene contain distinct trajectories, which are suitable to evaluate the case of novel poses. We adopt the same training and inference split as in the baseline APR methods (Kendall, Grimes, and Cipolla 2015; Shavit, Ferens, and Keller 2021). |
| Hardware Specification | Yes | All models are trained using one RTX-4090 GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Unless otherwise stated, the code and hyper-parameter settings of the baselines strictly follow the official code release. The original Pose-Transformer can use multiple regression heads and scene-dependent latent embeddings to handle multiple scenes. We only apply multiple regression heads since it is sufficient to achieve similar performance (Appendix A.4). APRs are first learned on training data until converging in the initial training. In the main experiment, we follow the setup of large scale continual learning (Cai, Sener, and Koltun 2021) and limit the computation budget of Con Do by first identifying the budget b = epoch iteration per epoch batch size/|SΩ| for the baseline APR model to converge on the initial training data SΩ. b represents the average number of iterations required per image. Then for every round of Con Do update with N images newly revealed, we assign N b/batch size training iterations (see Appendix A.2 for actual numbers of b) with the same batch size as the initial training, so that the whole Con Do procedure including initial training and all Con Do updates, consumes roughly only the budget to train one APR model from scratch on all revealed data. |