reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ConDo: Continual Domain Expansion for Absolute Pose Regression

Authors: Zijun Li, Zhipeng Cai, Bochun Yang, Xuelun Shen, Siqi Shen, Xiaoliang Fan, Michael Paulitsch, Cheng Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Large-scale benchmarks with various scene types are constructed to evaluate models under practical (long-term) data changes. Con Do consistently and signiﬁcantly outperforms baselines across architectures, scene types, and data changes. ...Experiments validate the effectiveness of Con Do on different baseline architectures and data with both scene and pose changes.
Researcher Affiliation	Collaboration	1Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China 2Intel Labs {lizijun;yangbc;xuelun}@stu.xmu.edu.cn,{zhipeng.cai;michael.paulitsch}@intel.com, {siqishen;fanxiaoliang;cwang}@xmu.edu.cn
Pseudocode	No	The paper describes the methodology in prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Zijun Li7/Con Do
Open Datasets	Yes	To simulate novel poses and the sequentially revealed multiple scenes, we adopt standard APR datasets, namely, 7Scenes and Cambridge (Glocker et al. 2013; Kendall, Grimes, and Cipolla 2015). ...we utilize large-scale driving datasets with both signiﬁcant lighting changes (daytime to night time) and long-term scene changes (spring to winter). Speciﬁcally, we take the Ofﬁce Loop and Neighborhood, which are two large-scale scenes in 4Seasons (Wenzel et al. 2021)
Dataset Splits	Yes	To simulate the practical scenario, we split multiple scans of the same scene into training and inference and reveal the inference scans sequentially, i.e., every round of Con Do model update starts when a new inference scan is revealed. We randomly hold out 1/8 images in each scan (training and inference) and use them to evaluate the generalization of APR on the corresponding scan. To create challenging evaluation data, instead of holding out individual images uniformly distributed in each scan, we hold several sets of images where each set is a continuous trajectory of the scan consisting of 16 images (see Fig. 3). The held-out evaluation data allow us to fully evaluate APR on images unseen both during normal training and Con Do. To simulate novel poses and the sequentially revealed multiple scenes, we adopt standard APR datasets, namely, 7Scenes and Cambridge (Glocker et al. 2013; Kendall, Grimes, and Cipolla 2015). These two datasets represent the case of indoor and outdoor scenes respectively and different scans of the same scene contain distinct trajectories, which are suitable to evaluate the case of novel poses. We adopt the same training and inference split as in the baseline APR methods (Kendall, Grimes, and Cipolla 2015; Shavit, Ferens, and Keller 2021).
Hardware Specification	Yes	All models are trained using one RTX-4090 GPU.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	Unless otherwise stated, the code and hyper-parameter settings of the baselines strictly follow the ofﬁcial code release. The original Pose-Transformer can use multiple regression heads and scene-dependent latent embeddings to handle multiple scenes. We only apply multiple regression heads since it is sufﬁcient to achieve similar performance (Appendix A.4). APRs are ﬁrst learned on training data until converging in the initial training. In the main experiment, we follow the setup of large scale continual learning (Cai, Sener, and Koltun 2021) and limit the computation budget of Con Do by ﬁrst identifying the budget b = epoch iteration per epoch batch size/\|SΩ\| for the baseline APR model to converge on the initial training data SΩ. b represents the average number of iterations required per image. Then for every round of Con Do update with N images newly revealed, we assign N b/batch size training iterations (see Appendix A.2 for actual numbers of b) with the same batch size as the initial training, so that the whole Con Do procedure including initial training and all Con Do updates, consumes roughly only the budget to train one APR model from scratch on all revealed data.