DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
Authors: Hengwei Bian, Lingdong Kong, Haozhe Xie, Liang Pan, Yu Qiao, Ziwei Liu
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the Carla SC and Waymo datasets demonstrate that Dynamic City significantly outperforms existing state-of-the-art 4D occupancy generation methods across multiple metrics. The code and models have been released to facilitate future research. |
| Researcher Affiliation | Collaboration | 1Shanghai AI Laboratory 2Carnegie Mellon University 3National University of Singapore 4S-Lab, Nanyang Technological University |
| Pseudocode | No | The paper describes the methodology using textual explanations and diagrams (Figures 2, 3, 4, 5) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models have been released to facilitate future research. For more detailed examples, kindly refer to our Project Page: https://dynamic-city.github.io. |
| Open Datasets | Yes | We train the proposed model on the 1Occ3D-Waymo, 2Occ3D-nu Scenes, and 3Carla SC datasets. The former two from Occ3D (Tian et al., 2023) are derived from Waymo (Sun et al., 2020) and nu Scenes (Caesar et al., 2020)... The Carla SC dataset (Wilson et al., 2022) is a synthetic occupancy dataset... E.1 PUBLIC DATASETS USED nu Scenes1 ... Waymo Open Dataset3 ... Carla SC4 ... Occ3D5 |
| Dataset Splits | No | The paper mentions the total number of 'training scenes' for each dataset (e.g., 'Occ3D-Waymo dataset contains 798 training scenes', 'Carla SC dataset contains 6 training scenes', 'Occ3D-nu Scenes dataset contains 600 scenes') but does not specify how these datasets are further split into training, validation, and test sets, nor does it refer to specific predefined splits. |
| Hardware Specification | Yes | Our experiments are conducted using eight NVIDIA A100-80G GPUs. |
| Software Dependencies | No | The paper states 'We implement both the VAE and Di T models using Py Torch (Paszke et al., 2019)' and mentions 'Flash Attention (Dao et al., 2022)'. While it identifies key software components, it does not provide specific version numbers for these or other libraries required for replication. |
| Experiment Setup | Yes | The global batch size used for training the VAE is 8, while the global batch size for training the Di T is 128. Our latent Hex Plane H is compressed to half the size of the input Q in each dimension, with the latent channels C = 16. The weight for the Lovász-softmax and KL terms are set to 1 and 0.005, respectively. The learning rate for the VAE is 10 3, while the learning rate for the Di T is 10 4. |