UniDrive: Towards Universal Driving Perception Across Camera Configurations

Authors: Ye Li, Wenzhao Zheng, Xiaonan Huang, Kurt Keutzer

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of our framework, we collect a dataset on CARLA by driving the same routes while only modifying the camera configurations. Experimental results demonstrate that our method trained on one specific camera configuration can generalize to varying configurations with minor performance degradation. We validate our framework in CARLA by training and testing models on different camera configurations, demonstrating that our approach significantly reduces performance degradation while maintaining adaptability across diverse sensor setups. In Table 1 and 2, we present the 3D object detection results of BEVFusion-C (Liu et al., 2023b) and Uni Drive. We conduct comparative studies to evaluate the performance of camera perception across configurations. Through our analysis, we are able to demonstrate the effectiveness of Uni Drive framework. In this section, we analyze the interplay between our proposed virtual projection strategy and perception performance to address these questions: 1) What s the impact of camera extrinsic and intrinsic for cross-configuration perception? 2) How Uni Drive works towards these parameters separately? In this section, we further investigate some useful insight points found in the benchmark experiments: 1) What s the impact of inconsistency in multi-camera intrinsics for perception? 2) How Uni Drive works towards this inconsistency?
Researcher Affiliation Academia Ye Li1 Wenzhao Zheng2 Xiaonan Huang1 Kurt Keutzer2 1University of Michigan, Ann Arbor 2University of California, Berkeley
Pseudocode Yes Algorithm 1 Virtual Camera Projection ... Algorithm 2 Virtual Camera Configuration Optimization
Open Source Code No The paper includes a URL (https://wzzheng.net/Uni Drive) which appears to be a project homepage. However, the text does not explicitly state that source code for the described methodology is released or provide a direct link to a code repository.
Open Datasets No We generate multi-view image data and 3D objects ground truth in CARLA simulator (Dosovitskiy et al., 2017). We use the maps of Towns 1-6 to collect data. The dataset consists of 500 scenes (20,000 frames) for each camera configuration. Our dataset is organized as the format of nu Scenes (Caesar et al., 2020) and compatible to the nuscenes-devkit python package for convenient processing. The paper describes generating its own dataset on CARLA and organizing it in the nuScenes format, but does not explicitly state that this *generated dataset* is publicly available or provide access information for it.
Dataset Splits Yes The dataset consists of 500 scenes (20,000 frames) for each camera configuration. We split 250 scenes for training and 250 scenes for validation.
Hardware Specification No Due to the extensive computation resource needed to benchmark the multi-camera configurations, we only compare our method with the camera variant of BEVFusion (Liu et al., 2023b) (abbreviated as BEVFusion-C). The paper mentions 'extensive computation resource' but does not specify any particular GPU, CPU models, or detailed hardware specifications used for experiments.
Software Dependencies No We generate multi-view image data and 3D objects ground truth in CARLA simulator (Dosovitskiy et al., 2017). Our dataset is organized as the format of nu Scenes (Caesar et al., 2020) and compatible to the nuscenes-devkit python package for convenient processing. The paper mentions CARLA simulator and nuscenes-devkit python package but does not provide specific version numbers for these or any other software components.
Experiment Setup No The paper describes general experimental settings like data generation in CARLA, camera configurations, and the detection method used (BEVFusion-C). However, it does not provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings, which are crucial for reproducing the experiments.