Achieving Speed-Accuracy Balance in Vision-based 3D Occupancy Prediction via Geometric-Semantic Disentanglement

Authors: Yulin He, Wei Chen, Siqi Wang, Tianci Xun, Yusong Tan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method achieves 39.4% m Io U at 20 FPS on Occ3D-nu Scenes, showcasing a state-of-the-art balance between accuracy and efficiency. We evaluate our model using the Occ3D-nu Scenes (Tian et al. 2023) benchmark, which is based on nu Scenes (Caesar et al. 2020) dataset and constructed for the CVPR2023 3D occupancy prediction challenge.
Researcher Affiliation Academia School of Computer, National University of Defense Technology, Changsha, China EMAIL
Pseudocode No The paper describes the methods in prose and through architectural diagrams (Figure 3 and Figure 4) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code https://github.com/harrylin-hyl/GSD-OCC
Open Datasets Yes We evaluate our model using the Occ3D-nu Scenes (Tian et al. 2023) benchmark, which is based on nu Scenes (Caesar et al. 2020) dataset and constructed for the CVPR2023 3D occupancy prediction challenge.
Dataset Splits Yes The dataset consists of 1000 videos, split into 700 for training, 150 for validation, and 150 for testing.
Hardware Specification Yes During training, we use a batch size of 32 on 8 Nvida A100 GPUs. ... During inference, we use a batch size of 1 on a single Nvidia A100 GPU. The FPS of all methods are evaluated on an Nvidia A100 GPU, except for Fast OCC, which is reported using an Nvidia V100 GPU in its paper.
Software Dependencies No The paper mentions using Res Net-50 as the image backbone, the Adam W optimizer, and the mmdetection3d codebase, but does not provide specific version numbers for these software components or other key libraries.
Experiment Setup Yes We maintain a memory queue of length 15 to store historical features. For RLK-3DConv, we set the size of convolution kernel to [11, 11, 1]. The steepness parameter r is set to 5 in geometric-semantic disentangled learning. During training, we use a batch size of 32 on 8 Nvida A100 GPUs. Unless otherwise specified, all models are trained for 24 epochs using the Adam W optimizer (Loshchilov, Hutter et al. 2017) with a learning rate 1 10 4 and a weight decay of 0.05.