Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

Authors: Yubin Wang, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Errui Ding, Cai Zhao

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted across multiple dataset consolidation scenarios involving KITTI, Waymo, and nu Scenes, demonstrating that our Uni2Det outperforms existing methods by a large margin in multi-dataset training. Notably, results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method.
Researcher Affiliation Collaboration Yubin Wang1 , Zhikang Zou2 , Xiaoqing Ye2, Xiao Tan2, Errui Ding2, Cairong Zhao1 1School of Computer Science and Technology, Tongji University, 2Baidu Inc. Corresponding Author. Email: EMAIL
Pseudocode No The paper describes the methodology using textual descriptions and architectural diagrams (Figures 1, 2, 3), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/ThomasWangY/Uni2Det.
Open Datasets Yes Our experiments are conducted on three commonly used autonomous driving datasets: Waymo (Sun et al., 2020), nu Scenes (Caesar et al., 2020), and KITTI (Geiger et al., 2012).
Dataset Splits Yes nu Scenes (Caesar et al., 2020) comprises 28,130 training samples and 6,019 validation samples collected using 32-beam Li DAR. KITTI (Geiger et al., 2012) includes 7,481 annotated Li DAR frames collected via 64-beam Li DAR. We utilize only 20% of the uniformly sampled frames on Waymo dataset for model training. All experimental results presented in this paper are reported on the official validation set.
Hardware Specification Yes The network is trained across 8 NVIDIA A800 GPUs, with a total training epoch set to 30.
Software Dependencies No The experiments are conducted using Open PCDet (Team et al., 2020). Particularly, we note that differences in point cloud range significantly degrade cross-dataset detection accuracy. Therefore, we align the point cloud range of all datasets to [75.2, 75.2]m for the X and Y axes and [2, 4]m for the Z-axis. In all experimental settings, we follow Uni3D (Zhang et al., 2023) and employ the standard optimization techniques utilized by PV-RCNN (Shi et al., 2020a) and Voxel RCNN (Deng et al., 2021).
Experiment Setup Yes For the balancing ratio α in our proposed mean-shifted batch normalization, we set α = 0.1 for Voxel RCNN and α = 0.5 for PV-RCNN. This involves using the Adam optimizer with an initial learning rate of 0.01 and implementing the One Cycle learning rate decay strategy. The network is trained across 8 NVIDIA A800 GPUs, with a total training epoch set to 30. For the experiments on Waymo-KITTI and nu Scenes-KITTI consolidations, the weight decay is set to 0.01, while for Waymo-nu Scenes consolidation, it is set to 0.001.