Achieving Speed-Accuracy Balance in Vision-based 3D Occupancy Prediction via Geometric-Semantic Disentanglement
Authors: Yulin He, Wei Chen, Siqi Wang, Tianci Xun, Yusong Tan
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method achieves 39.4% m Io U at 20 FPS on Occ3D-nu Scenes, showcasing a state-of-the-art balance between accuracy and efficiency. We evaluate our model using the Occ3D-nu Scenes (Tian et al. 2023) benchmark, which is based on nu Scenes (Caesar et al. 2020) dataset and constructed for the CVPR2023 3D occupancy prediction challenge. |
| Researcher Affiliation | Academia | School of Computer, National University of Defense Technology, Changsha, China EMAIL |
| Pseudocode | No | The paper describes the methods in prose and through architectural diagrams (Figure 3 and Figure 4) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/harrylin-hyl/GSD-OCC |
| Open Datasets | Yes | We evaluate our model using the Occ3D-nu Scenes (Tian et al. 2023) benchmark, which is based on nu Scenes (Caesar et al. 2020) dataset and constructed for the CVPR2023 3D occupancy prediction challenge. |
| Dataset Splits | Yes | The dataset consists of 1000 videos, split into 700 for training, 150 for validation, and 150 for testing. |
| Hardware Specification | Yes | During training, we use a batch size of 32 on 8 Nvida A100 GPUs. ... During inference, we use a batch size of 1 on a single Nvidia A100 GPU. The FPS of all methods are evaluated on an Nvidia A100 GPU, except for Fast OCC, which is reported using an Nvidia V100 GPU in its paper. |
| Software Dependencies | No | The paper mentions using Res Net-50 as the image backbone, the Adam W optimizer, and the mmdetection3d codebase, but does not provide specific version numbers for these software components or other key libraries. |
| Experiment Setup | Yes | We maintain a memory queue of length 15 to store historical features. For RLK-3DConv, we set the size of convolution kernel to [11, 11, 1]. The steepness parameter r is set to 5 in geometric-semantic disentangled learning. During training, we use a batch size of 32 on 8 Nvida A100 GPUs. Unless otherwise specified, all models are trained for 24 epochs using the Adam W optimizer (Loshchilov, Hutter et al. 2017) with a learning rate 1 10 4 and a weight decay of 0.05. |