STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
Authors: Jiawei Yang, Jiahui Huang, Boris Ivanovic, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You, Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Yue Wang, Marco Pavone
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on public datasets show that STORM achieves precise dynamic scene reconstruction, surpassing state-of-the-art perscene optimization methods (+4.3 to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic regions. |
| Researcher Affiliation | Collaboration | EMAIL, University of Southern California $ EMAIL, Georgia Institute of Technology EMAIL, Stanford University EMAIL, NVIDIA Research Equal advising. |
| Pseudocode | No | The paper describes the methodology in text and through figures but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | For more details, please visit our project page. The paper mentions a 'project page' but does not explicitly state that the source code for the described methodology is available there, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We conduct extensive experiments on the Waymo Open dataset (Sun et al., 2020), Nu Scenes (Caesar et al., 2020) and Argoverse2 (Wilson et al.) to evaluate the performance of STORM. |
| Dataset Splits | Yes | We primarily conduct experiments on the Waymo Open Dataset (Sun et al., 2020), which contains 1,000 sequences of driving logs: 798 sequences for training and 202 for validation. |
| Hardware Specification | Yes | Speed metrics are estimated on a single A100 GPU. |
| Software Dependencies | No | Our GS backend is based on gsplat (Ye et al., 2024). For the LPIPS loss, we utilize a VGG-19-based (Simonyan & Zisserman, 2014) implementation. The paper mentions software components but does not provide specific version numbers for reproducibility. |
| Experiment Setup | Yes | We train our model for 100,000 iterations with a global batch size of 64 on NVIDIA A100 GPUs, using a learning rate of 4 10 4. The training process utilizes the Adam W optimizer (Loshchilov & Hutter, 2019) along with a cosine learning rate scheduler that includes a linear warmup phase over the first 5,000 iterations. We set λlpips to 0.05, λsky to 0.1, and λreg to 5e-3 in all experiments. |