reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Glad: A Streaming Scene Generator for Autonomous Driving

Authors: Bin Xie, Yingfei Liu, Tiancai Wang, Jiale Cao, Xiangyu Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are performed on the widely-used nu Scenes dataset. Experimental results demonstrate that our proposed Glad achieves promising performance, serving as a strong baseline for online video generation. We perform the experiments on the public autonomous driving dataset nu Scenes, which demonstrates the efficacy of our Glad. The experiments are performed on the widely-used dataset nu Scenes.
Researcher Affiliation	Collaboration	Bin Xie1 , Yingfei Liu2 , Tiancai Wang2, Jiale Cao1 , Xiangyu Zhang2,3 1Tianjin University, 2MEGVII Technology, 3Step Fun EMAIL EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	We will release the source code and models publicly.
Open Datasets	Yes	Extensive experiments are performed on the widely-used nu Scenes dataset. nu Scenes (Caesar et al., 2019), CARLA (Dosovitskiy et al., 2017), Waymo (Ettinger et al., 2021), and ONCE (Mao et al., 2021).
Dataset Splits	Yes	The nu Scenes dataset was collected from 1000 different driving scenes in Boston and Singapore. These scenes are split into training, validation, and test sets. Specifically, the training set contains 700 scenes, the validation set contains 150 scenes, and the test set contains 150 scenes. In every scene, there are 6 camera views and each view records a length of about 20 second driving video. We split each video into 2 clips to balance video length and data diversity.
Hardware Specification	Yes	We train our models on 8 NVIDIA A100 GPUs with the mini-batch of 2 images. The inference time of complete denoising process is reported in single NVIDIA A100 GPU.
Software Dependencies	Yes	Our Glad is implemented based on Stable Diffusion 2.1 (Rombach et al., 2022).
Experiment Setup	Yes	We train our models on 8 NVIDIA A100 GPUs with the mini-batch of 2 images. During training, we first perform image-level pre-training. Constant learning rate 4 10 5 has been adopted, and there are 1.25M iterations totally. Afterwards, we fine-tune our Glad on nu Scenes dataset with same settings for 48 epochs. We split each video into 2 clips to balance video length and data diversity. During inference, we utilize the DDIM (Song et al., 2020) sampler with 25 sampling steps and scale of the CFG as 5.0. The image is generated at a spatial resolution of 256 3072 pixels with 6 different views, and split it to 6 images of 256 512 pixels for evaluation.