Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision

Authors: Maoji Zheng, Ziyu Xu, Qiming Xia, Hai Wu, Chenglu Wen, Cheng Wang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the Waymo Open Dataset and nu Scenes Dataset show that our method significantly outperforms other competitive methods by 23.7% and 10.3% in m AP, respectively. The results demonstrate the great label-efficient potential and advancement of our method.
Researcher Affiliation Academia Maoji Zheng1,2, Ziyu Xu1,2, Qiming Xia1,2, Hai Wu1,2, Chenglu Wen1,2*, Cheng Wang1,2 1Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University, China 2Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China EMAIL, EMAIL
Pseudocode No The paper describes the methodology in prose and with block diagrams (e.g., Figure 3: Illustration of Seg2Box framework), but does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper mentions using publicly available code from Open PCDet (Team 2020) for their experiments: 'We trained both Waymo Open Dataset and nu Scenes Dataset for 30 epochs and selected the best validation accuracy epoch as a result. All those experiments were trained on 2 NVIDIA Ge Force RTX 3090 GPUs with the ADAM optimizer.' However, it does not state that the authors of this paper are releasing their own code for the Seg2Box method.
Open Datasets Yes We validated our method on widely used Waymo Open Dataset (WOD) (Sun et al. 2020) and nu Scenes Dataset (Caesar et al. 2020).
Dataset Splits Yes Waymo Open Dataset (WOD). For bounding box annotation, WOD (Sun et al. 2020) contains a total of 158k Li DAR frames for training and 40k Li DAR frames for validation. For semantic annotation, WOD annotates one frame every 7 frames on the top Li DAR scan, resulting in 23k frames for training. Nu Scenes Dataset. nu Scenes (Caesar et al. 2020) ... It contains 1,000 sequences, with 700, 150, and 150 for training, validation, and testing, respectively.
Hardware Specification Yes All those experiments were trained on 2 NVIDIA Ge Force RTX 3090 GPUs with the ADAM optimizer.
Software Dependencies No The paper mentions using the 'ADAM optimizer' and adopting the implementation of 'publicly available code from Open PCDet (Team 2020)' but does not specify version numbers for any programming languages, libraries, or toolboxes.
Experiment Setup Yes In the pseudo-label generation stage, we used grid size r = 7 for Eq.1 to calculate the Occupancy-Score of the pseudo-label. We used λ1 = λ2 = λ3 = 1/3 in Eq.4 for the weights of MSF-Score. We used θL = 0.4 and θH = 0.8 in Eq.5 to calculate the loss weight of each pseudo-label. We trained both Waymo Open Dataset and nu Scenes Dataset for 30 epochs and selected the best validation accuracy epoch as a result. All those experiments were trained on 2 NVIDIA Ge Force RTX 3090 GPUs with the ADAM optimizer.