GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
Authors: Jinqing Zhang, Yanan Zhang, Yunlong Qi, Zehua Fu, Qingjie Liu, Yunhong Wang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on the nu Scenes Dataset, and Geo BEV reaches newly state-of-the-art results of 66.2% NDS for multi-view 3D object detection, highlighting its effectiveness. |
| Researcher Affiliation | Collaboration | 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China 2Beijing Jingwei Hirain Technologies Co., Inc. 3Hangzhou Innovation Institute, Beihang University, Hangzhou, China 4Zhongguancun Laboratory, Beijing, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and processes through narrative text and mathematical formulations, such as Equation 1, 2, 3, 4, 5, 6, and 7, and visual diagrams (Figure 2, 3, 4). However, it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/mengtan00/Geo BEV.git |
| Open Datasets | Yes | We evaluate our proposed method on the nu Scenes (Caesar et al. 2020) dataset, a commonly used autonomous driving benchmark. |
| Dataset Splits | Yes | The 1000 scenarios are split into training set (750 scenarios), validation (150 scenarios) and test set (150 scenarios). |
| Hardware Specification | No | The paper specifies image backbones (Res Net50, Res Net101, Vo VNet-99) and image sizes used for experiments, but does not provide details on the specific hardware (e.g., GPU models, CPU types) on which these experiments were run. |
| Software Dependencies | No | The paper mentions using the CBGS strategy and BEV-Paste, and pre-trained HTC models, but does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used. |
| Experiment Setup | Yes | These models are trained for 20 epochs with CBGS strategy (Zhu et al. 2019). Except for regular data augmentation, the BEV-Paste (Zhang et al. 2023a) is adopted to alleviate overfitting during the long training process. Future frames and test-time augmentation are not adopted. For the ablation study, we use Res Net50 as the image backbone and the models are trained for 24 epochs without the CBGS strategy. |