Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

Authors: Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang, yilun chen

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the nu Scenes dataset validate the effectiveness and scalability of our method, and demonstrate that Pre World achieves competitive performance across 3D occupancy prediction, 4D occupancy forecasting and motion planning tasks.
Researcher Affiliation Academia Xiang Li, Pengfei Li, Yupeng Zheng, Wei Sun, Yan Wang , Yilun Chen Institute for AI Industry Research (AIR), Tsinghua University EMAIL; EMAIL
Pseudocode No The paper describes methods and architectures but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured code-like procedures.
Open Source Code Yes 1Codes and models can be accessed at https://github.com/getterupper/Pre World.
Open Datasets Yes Our experiments are conducted on the Occ3D-nu Scenes benchmark (Tian et al., 2024), which provides dense semantic occupancy annotations for the widely used nu Scenes dataset (Caesar et al., 2020).
Dataset Splits Yes The official split for training and validation sets is employed.
Hardware Specification Yes All experiments are conducted on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using Adam as the optimizer and references specific models like BEVStereo and FB-OCC, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, etc.).
Experiment Setup Yes For training, we set the batch size to 16, use Adam as the optimizer, and train with a learning rate of 1 10 4. All the hyperparameters λ in the loss functions have been set to 1.0. For 3D occupancy prediction task, Pre World undergoes 6 epochs in self-supervised pre-training stage and 12 epochs in fully-supervised fine-tuning stage. For 4D occupancy forecasting and motion planning task, Pre World undergoes 8 epochs in self-supervised pre-training stage and 18 epochs in fully-supervised fine-tuning stage.