reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

Authors: Zhuoxiao Chen, Junjie Meng, Mahsa Baktashmotlagh, Yonggang Zhang, Zi Huang, Yadan Luo

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method was rigorously tested against existing test-time adaptation strategies across three datasets and eight types of corruptions, demonstrating superior adaptability to dynamic scenes and conditions. Notably, it achieved a 67.3% improvement in a challenging cross-corruption scenario, offering a more comprehensive benchmark for adaptation.
Researcher Affiliation	Academia	Zhuoxiao Chen Junjie Meng Mahsa Baktashmotlagh Yonggang Zhang Zi Huang Yadan Luo The University of Queensland Hong Kong Baptist University Correspondence to Yadan Luo <EMAIL>.
Pseudocode	Yes	The overall workflow of our method unfolds in three phases, as shown in Fig. 4 and Algorithm 1 (in Appendix).
Open Source Code	Yes	Source code: https://github.com/zhuoxiao-chen/MOS.
Open Datasets	Yes	Datasets and TTA-3OD Tasks. We perform extensive experiments across three widely used Li DAR-based 3D object detection datasets: KITTI (Geiger et al., 2012), Waymo (Sun et al., 2020), and nu Scenes (Caesar et al., 2020), along with a recently introduced dataset simulating real-world corruptions, KITTI-C (Kong et al., 2023) for TTA-3OD challenges.
Dataset Splits	No	The paper uses well-known public datasets like KITTI, Waymo, and nu Scenes, but it does not explicitly provide the specific training/test/validation split percentages or sample counts used in its experiments within the main text or appendix.
Hardware Specification	Yes	Our code is developed on the Open PCDet (Team, 2020) point cloud detection framework, and operates on a single NVIDIA RTX A6000 GPU with 48 GB of memory.
Software Dependencies	Yes	Our code is developed on the Open PCDet (Team, 2020) point cloud detection framework
Experiment Setup	Yes	We choose a batch size of 8, and set hyperparameters L = 112 across all experiments. We set the model bank size of K = 5 to balance performance and memory usage. For evaluation, we use the KITTI benchmark s official metrics, reporting average precision for car class in both 3D (i.e., AP3D) and bird s eye view (i.e., APBEV), over 40 recall positions, with a 0.7 Io U threshold. We set Sfeat to a small positive value ϵ = 0.01 once rank( ) reaches D, to ensure G is invertible. The detection model is pretrained using the training set of the source dataset, and subsequently adapted and tested on the validation set of KITTI. Augmentations. We adopt data augmentation strategies from prior studies (Yang et al., 2022; 2021; Luo et al., 2021; Chen et al., 2023a) for methods requiring augmentations, such as Mem CLR (VS et al., 2023), Co TTA (Wang et al., 2022a), and MOS. While Co TTA suggests employing multiple augmentations, our empirical results indicate that for TTA-3OD, utilizing only a single random world scaling enhances performance, whereas additional augmentations diminish it. Consequently, following the approach (Luo et al., 2021), we implement random world scaling for the mean-teacher baselines, applying strong augmentation (scaling between 0.9 and 1.1) and weak augmentation (scaling between 0.95 and 1.05) for all test-time domain adaptation tasks. Pseudo-labeling. We directly apply the pseudo-labeling strategies from (Yang et al., 2021; 2022) to Co TTA and MOS for self-training, using the default configurations. Baseline Losses. For Tent (Wang et al., 2021) and SAR (Niu et al., 2023), which calculate the entropy minimization loss, we sum the losses based on classification logits for all proposals from the first detection stage. For Mem CLR (VS et al., 2023), we integrate its implementation into 3D detectors by reading/writing pooled region of interest (Ro I) features extracted from the second detection stage, and compute the memory contrastive loss. For all baseline methods, we use default hyperparameters from their implementation code.