reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios

Authors: Yichen Xie, Chenfeng Xu, Chensheng Peng, Shuqi Zhao, Nhat Ho, Alexander Pham, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive results demonstrate the high-fidelity synthetic results of X-DRIVE for both point clouds and multi-view images, adhering to input conditions while ensuring reliable cross-modality consistency. Our code will be made publicly available at https://github.com/yichen928/X-Drive. Extensive experiments demonstrate the great ability of X-DRIVE in generating realistic multimodality sensor data. It notably outperforms previous specialized single-modality algorithms in the quality of both synthetic point clouds and multi-view images.
Researcher Affiliation	Collaboration	Yichen Xie1 , Chenfeng Xu1 , Chensheng Peng1, Shuqi Zhao1, Nhat Ho2, Alexander T. Pham3, Mingyu Ding1 , Masayoshi Tomizuka1, Wei Zhan1 1UC Berkeley 2UT Austin 3Toyota North America
Pseudocode	No	The paper describes the methodology using textual explanations and diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code will be made publicly available at https://github.com/yichen928/X-Drive.
Open Datasets	Yes	Dataset. We evaluate our method using nu Scenes dataset (Caesar et al., 2020).
Dataset Splits	Yes	We follow the official setting to employ 700 driving scenes for training and 150 scenes for validation.
Hardware Specification	Yes	In all the stages, our model is trained using NVIDIA RTX A6000 GPUs.
Software Dependencies	Yes	We utilize the Stable-Diffusion pretrained weight to initialize the multi-view image branch with other newly added parameters randomly initialized. We follow Magic Drive (Gao et al., 2023) and Range LDM (Hu et al., 2024a) to synthesize 224 400 multi-view camera images and 32 1024 point cloud range images.
Experiment Setup	Yes	In the first stage, VAE for Li DAR range image is trained using batch size 96 and learning rate 4e-4 for 200 epochs. The discriminator takes effect after 1000 iterations. In the second stage, we train the Li DAR LDM from scratch using batch size 96 and learning rate 1e-4 for 2000 epochs. The model includes the text prompt and 3D range-view bounding box condition modules with drop-rate 0.25 for either condition during training. The entire model is trained for 250 epochs with learning rate 8e-5 and batch size 24 in our main experiments. For ablation studies, we reduce the epoch number to 80 for efficiency.