STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes

Authors: Jiawei Yang, Jiahui Huang, Boris Ivanovic, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You, Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Yue Wang, Marco Pavone

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on public datasets show that STORM achieves precise dynamic scene reconstruction, surpassing state-of-the-art perscene optimization methods (+4.3 to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic regions.
Researcher Affiliation Collaboration EMAIL, University of Southern California $ EMAIL, Georgia Institute of Technology EMAIL, Stanford University EMAIL, NVIDIA Research Equal advising.
Pseudocode No The paper describes the methodology in text and through figures but does not include any structured pseudocode or algorithm blocks.
Open Source Code No For more details, please visit our project page. The paper mentions a 'project page' but does not explicitly state that the source code for the described methodology is available there, nor does it provide a direct link to a code repository.
Open Datasets Yes We conduct extensive experiments on the Waymo Open dataset (Sun et al., 2020), Nu Scenes (Caesar et al., 2020) and Argoverse2 (Wilson et al.) to evaluate the performance of STORM.
Dataset Splits Yes We primarily conduct experiments on the Waymo Open Dataset (Sun et al., 2020), which contains 1,000 sequences of driving logs: 798 sequences for training and 202 for validation.
Hardware Specification Yes Speed metrics are estimated on a single A100 GPU.
Software Dependencies No Our GS backend is based on gsplat (Ye et al., 2024). For the LPIPS loss, we utilize a VGG-19-based (Simonyan & Zisserman, 2014) implementation. The paper mentions software components but does not provide specific version numbers for reproducibility.
Experiment Setup Yes We train our model for 100,000 iterations with a global batch size of 64 on NVIDIA A100 GPUs, using a learning rate of 4 10 4. The training process utilizes the Adam W optimizer (Loshchilov & Hutter, 2019) along with a cosine learning rate scheduler that includes a linear warmup phase over the first 5,000 iterations. We set λlpips to 0.05, λsky to 0.1, and λreg to 5e-3 in all experiments.