DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving

Authors: Wencheng Han, Dongqian Guo, Cheng-Zhong Xu, Jianbing Shen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations demonstrate that our method achieves state-of-the-art accuracy in autonomous driving planning, significantly enhancing the system s reasoning ability. Every driving decision made by the system can be traced back through logs to understand the underlying driving logic, providing a level of transparency and explainability that is unprecedented in autonomous driving systems. ... In our ablation study of the DME-Driver system, we methodically dissected the impact of decision-making effectiveness, as shown in Table 4. We began by assessing the standalone performance of the Executor without Decision Maker guidance, establishing a baseline. Next, we evaluated the impact of substituting the Decision-Maker s guidance with ground truth language cues, observing potential improvements. Following this, we examined the combined performance of the Decision-Maker and Executor, gauging their collaborative efficiency.
Researcher Affiliation Academia SKL-IOTSC, CIS, University of Macau EMAIL, EMAIL
Pseudocode No The paper describes the architecture and functionality of the DME-Driver system and its components (Decision-Maker and Executor) using diagrams and descriptive text, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing the code for the methodology described, nor does it provide a link to a code repository.
Open Datasets No To effectively train the proposed system, a new dataset named Human-driver Behavior and Decisionmaking (HBD) dataset has been collected. This dataset encompasses a diverse range of human driver behaviors and their underlying motivations. ... Leveraging both reannotated datasets and newly collected data, we developed a distinctive dataset that integrates human driver behavior logic with detailed environmental perception. ... To further enhance the LVLM s capabilities, we collected a new sub-dataset with multi-round question-answering conversations.
Dataset Splits No The re-annotated data include 591,574 images with 985,739 targets, forming 191,786 conversations. Each conversation contains 2-3 turns of question and answer exchanges. While the newly collected data comprises 2,608,038 images with more than 3 million targets in total, forming 207,150 multi-turn conversations. ... we conducted an evaluation using the test set of the HBD dataset. ... The fine-tuning stage then tailors the model to the specific needs of interpretable autonomous driving. Here, the LLM is trained alongside the visual tokenizer using 39K video-text pairs, from the proposed HBD Dataset, and supplemented with 80K instructionfollowing image-text pairs from LLa VA (Liu et al. 2023a).
Hardware Specification Yes Our experiments are conducted on a workstation with 8x H800 Graphic Cards.
Software Dependencies No The paper mentions several models and frameworks such as LLa VA, LLa MA 2, CLIP, RT-2, Uni AD, and Bert-based text encoder, but does not specify their version numbers or any other key software dependencies with specific versions.
Experiment Setup Yes Executor Training: The training of the Executor component in our DME-Driver system primarily follows the setup utilized by Uni AD (Hu et al. 2023). However, we introduce specific modifications to enhance the system s consistency. Initially, similar to Uni AD, we start by jointly training the perception parts, namely the tracking and mapping modules, for six epochs. We then proceed to an end-to-end training phase, which lasts for 20 epochs and encompasses all perception, prediction, and planning modules. To ensure alignment between the output signals of the planning module and the decisions made by the Decision-Maker, we introduce an auxilary loss during the training of the planning module.