reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving

Authors: Wencheng Han, Dongqian Guo, Cheng-Zhong Xu, Jianbing Shen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations demonstrate that our method achieves state-of-the-art accuracy in autonomous driving planning, significantly enhancing the system s reasoning ability. Every driving decision made by the system can be traced back through logs to understand the underlying driving logic, providing a level of transparency and explainability that is unprecedented in autonomous driving systems. ... In our ablation study of the DME-Driver system, we methodically dissected the impact of decision-making effectiveness, as shown in Table 4. We began by assessing the standalone performance of the Executor without Decision Maker guidance, establishing a baseline. Next, we evaluated the impact of substituting the Decision-Maker s guidance with ground truth language cues, observing potential improvements. Following this, we examined the combined performance of the Decision-Maker and Executor, gauging their collaborative efficiency.
Researcher Affiliation	Academia	SKL-IOTSC, CIS, University of Macau EMAIL, EMAIL
Pseudocode	No	The paper describes the architecture and functionality of the DME-Driver system and its components (Decision-Maker and Executor) using diagrams and descriptive text, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing the code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	No	To effectively train the proposed system, a new dataset named Human-driver Behavior and Decisionmaking (HBD) dataset has been collected. This dataset encompasses a diverse range of human driver behaviors and their underlying motivations. ... Leveraging both reannotated datasets and newly collected data, we developed a distinctive dataset that integrates human driver behavior logic with detailed environmental perception. ... To further enhance the LVLM s capabilities, we collected a new sub-dataset with multi-round question-answering conversations.
Dataset Splits	No	The re-annotated data include 591,574 images with 985,739 targets, forming 191,786 conversations. Each conversation contains 2-3 turns of question and answer exchanges. While the newly collected data comprises 2,608,038 images with more than 3 million targets in total, forming 207,150 multi-turn conversations. ... we conducted an evaluation using the test set of the HBD dataset. ... The fine-tuning stage then tailors the model to the specific needs of interpretable autonomous driving. Here, the LLM is trained alongside the visual tokenizer using 39K video-text pairs, from the proposed HBD Dataset, and supplemented with 80K instructionfollowing image-text pairs from LLa VA (Liu et al. 2023a).
Hardware Specification	Yes	Our experiments are conducted on a workstation with 8x H800 Graphic Cards.
Software Dependencies	No	The paper mentions several models and frameworks such as LLa VA, LLa MA 2, CLIP, RT-2, Uni AD, and Bert-based text encoder, but does not specify their version numbers or any other key software dependencies with specific versions.
Experiment Setup	Yes	Executor Training: The training of the Executor component in our DME-Driver system primarily follows the setup utilized by Uni AD (Hu et al. 2023). However, we introduce specific modifications to enhance the system s consistency. Initially, similar to Uni AD, we start by jointly training the perception parts, namely the tracking and mapping modules, for six epochs. We then proceed to an end-to-end training phase, which lasts for 20 epochs and encompasses all perception, prediction, and planning modules. To ensure alignment between the output signals of the planning module and the decisions made by the Decision-Maker, we introduce an auxilary loss during the training of the planning module.