MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Authors: Zhuoxiao Chen, Junjie Meng, Mahsa Baktashmotlagh, Yonggang Zhang, Zi Huang, Yadan Luo
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method was rigorously tested against existing test-time adaptation strategies across three datasets and eight types of corruptions, demonstrating superior adaptability to dynamic scenes and conditions. Notably, it achieved a 67.3% improvement in a challenging cross-corruption scenario, offering a more comprehensive benchmark for adaptation. |
| Researcher Affiliation | Academia | Zhuoxiao Chen Junjie Meng Mahsa Baktashmotlagh Yonggang Zhang Zi Huang Yadan Luo The University of Queensland Hong Kong Baptist University Correspondence to Yadan Luo <EMAIL>. |
| Pseudocode | Yes | The overall workflow of our method unfolds in three phases, as shown in Fig. 4 and Algorithm 1 (in Appendix). |
| Open Source Code | Yes | Source code: https://github.com/zhuoxiao-chen/MOS. |
| Open Datasets | Yes | Datasets and TTA-3OD Tasks. We perform extensive experiments across three widely used Li DAR-based 3D object detection datasets: KITTI (Geiger et al., 2012), Waymo (Sun et al., 2020), and nu Scenes (Caesar et al., 2020), along with a recently introduced dataset simulating real-world corruptions, KITTI-C (Kong et al., 2023) for TTA-3OD challenges. |
| Dataset Splits | No | The paper uses well-known public datasets like KITTI, Waymo, and nu Scenes, but it does not explicitly provide the specific training/test/validation split percentages or sample counts used in its experiments within the main text or appendix. |
| Hardware Specification | Yes | Our code is developed on the Open PCDet (Team, 2020) point cloud detection framework, and operates on a single NVIDIA RTX A6000 GPU with 48 GB of memory. |
| Software Dependencies | Yes | Our code is developed on the Open PCDet (Team, 2020) point cloud detection framework |
| Experiment Setup | Yes | We choose a batch size of 8, and set hyperparameters L = 112 across all experiments. We set the model bank size of K = 5 to balance performance and memory usage. For evaluation, we use the KITTI benchmark s official metrics, reporting average precision for car class in both 3D (i.e., AP3D) and bird s eye view (i.e., APBEV), over 40 recall positions, with a 0.7 Io U threshold. We set Sfeat to a small positive value ϵ = 0.01 once rank( ) reaches D, to ensure G is invertible. The detection model is pretrained using the training set of the source dataset, and subsequently adapted and tested on the validation set of KITTI. Augmentations. We adopt data augmentation strategies from prior studies (Yang et al., 2022; 2021; Luo et al., 2021; Chen et al., 2023a) for methods requiring augmentations, such as Mem CLR (VS et al., 2023), Co TTA (Wang et al., 2022a), and MOS. While Co TTA suggests employing multiple augmentations, our empirical results indicate that for TTA-3OD, utilizing only a single random world scaling enhances performance, whereas additional augmentations diminish it. Consequently, following the approach (Luo et al., 2021), we implement random world scaling for the mean-teacher baselines, applying strong augmentation (scaling between 0.9 and 1.1) and weak augmentation (scaling between 0.95 and 1.05) for all test-time domain adaptation tasks. Pseudo-labeling. We directly apply the pseudo-labeling strategies from (Yang et al., 2021; 2022) to Co TTA and MOS for self-training, using the default configurations. Baseline Losses. For Tent (Wang et al., 2021) and SAR (Niu et al., 2023), which calculate the entropy minimization loss, we sum the losses based on classification logits for all proposals from the first detection stage. For Mem CLR (VS et al., 2023), we integrate its implementation into 3D detectors by reading/writing pooled region of interest (Ro I) features extracted from the second detection stage, and compute the memory contrastive loss. For all baseline methods, we use default hyperparameters from their implementation code. |