Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection
Authors: Yubin Wang, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Errui Ding, Cai Zhao
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted across multiple dataset consolidation scenarios involving KITTI, Waymo, and nu Scenes, demonstrating that our Uni2Det outperforms existing methods by a large margin in multi-dataset training. Notably, results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method. |
| Researcher Affiliation | Collaboration | Yubin Wang1 , Zhikang Zou2 , Xiaoqing Ye2, Xiao Tan2, Errui Ding2, Cairong Zhao1 1School of Computer Science and Technology, Tongji University, 2Baidu Inc. Corresponding Author. Email: EMAIL |
| Pseudocode | No | The paper describes the methodology using textual descriptions and architectural diagrams (Figures 1, 2, 3), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/ThomasWangY/Uni2Det. |
| Open Datasets | Yes | Our experiments are conducted on three commonly used autonomous driving datasets: Waymo (Sun et al., 2020), nu Scenes (Caesar et al., 2020), and KITTI (Geiger et al., 2012). |
| Dataset Splits | Yes | nu Scenes (Caesar et al., 2020) comprises 28,130 training samples and 6,019 validation samples collected using 32-beam Li DAR. KITTI (Geiger et al., 2012) includes 7,481 annotated Li DAR frames collected via 64-beam Li DAR. We utilize only 20% of the uniformly sampled frames on Waymo dataset for model training. All experimental results presented in this paper are reported on the official validation set. |
| Hardware Specification | Yes | The network is trained across 8 NVIDIA A800 GPUs, with a total training epoch set to 30. |
| Software Dependencies | No | The experiments are conducted using Open PCDet (Team et al., 2020). Particularly, we note that differences in point cloud range significantly degrade cross-dataset detection accuracy. Therefore, we align the point cloud range of all datasets to [75.2, 75.2]m for the X and Y axes and [2, 4]m for the Z-axis. In all experimental settings, we follow Uni3D (Zhang et al., 2023) and employ the standard optimization techniques utilized by PV-RCNN (Shi et al., 2020a) and Voxel RCNN (Deng et al., 2021). |
| Experiment Setup | Yes | For the balancing ratio α in our proposed mean-shifted batch normalization, we set α = 0.1 for Voxel RCNN and α = 0.5 for PV-RCNN. This involves using the Adam optimizer with an initial learning rate of 0.01 and implementing the One Cycle learning rate decay strategy. The network is trained across 8 NVIDIA A800 GPUs, with a total training epoch set to 30. For the experiments on Waymo-KITTI and nu Scenes-KITTI consolidations, the weight decay is set to 0.01, while for Waymo-nu Scenes consolidation, it is set to 0.001. |