reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving

Authors: Yinzhe Shen, Omer Sahin Tas, Kaiwen Wang, Royden Wagner, Christoph Stiller

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the nu Scenes (Caesar et al., 2020) dataset showcase the effectiveness of DMAD structure in mitigating negative transfer. Our approach achieves significant performance gains in perception and prediction, which benefits the planning module and outperforms state-of-the-art (SOTA) E2E AD models. We conduct experiments on the nu Scenes (Caesar et al., 2020) dataset to validate the effectiveness of our method. We present results in three parts. The first part focuses on perception (detection, tracking, and mapping). In the second part, we evaluate motion prediction and planning. Lastly, we provide an extensive ablation study and SHAP values (Lundberg & Lee, 2017) visualization.
Researcher Affiliation	Academia	Yinzhe Shen1 Ömer Şahin Taş1,2 Kaiwen Wang1 Royden Wagner1 Christoph Stiller1,2 1Karlsruhe Institute of Technology (KIT) 2FZI Research Center for Information Technology
Pseudocode	No	The paper describes the architecture and processes (e.g., 'Interactive Semantic Decoder', 'Neural-Bayes Motion Decoder') in detail using prose and diagrams, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available.
Open Datasets	Yes	Experiments on the nu Scenes (Caesar et al., 2020) dataset showcase the effectiveness of DMAD structure in mitigating negative transfer. Our approach achieves significant performance gains in perception and prediction, which benefits the planning module and outperforms state-of-the-art (SOTA) E2E AD models.
Dataset Splits	No	The paper mentions using the nu Scenes dataset and conducting a 'two-stage training scheme' with 'queue length' specifications. While nu Scenes is a well-known dataset often used with predefined splits, the paper does not explicitly state the specific training, validation, or test splits used for its experiments, nor does it provide citations for predefined splits in the context of their usage.
Hardware Specification	Yes	Compared to Uni AD (Hu et al., 2023), our decoders add 13.1M parameters and increase inference latency by 0.02 seconds on an NVIDIA RTX 6000 Ada.
Software Dependencies	No	The paper does not explicitly list any specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) used for implementation or experimentation.
Experiment Setup	Yes	Two-stage training. We follow the two-stage training scheme of our baseline. In the first stage, we train object detection, tracking, and mapping. In the second stage, we train all modules together. Notably, because our tracking relies on reference points provided by unimodal prediction, we incorporate unimodal prediction training in the first stage. Multimodal prediction is trained only in the second stage, which is consistent with the baseline. Queue length. Since AD is a time-dependent task, the model typically processes a sequence of consecutive frames as a training sample. The number of input frames, i.e., the queue length q, defines the temporal horizon the model can capture, impacting the performance of related tasks. Uni AD employs different queue lengths across its two training stages: 5 in the first stage and 3 in the second. The multi-head self-attention module is configured with 8 heads, an embedding dimension of 256, and a dropout rate of 0.1. The FFN consists of two linear layers with an intermediate ReLU activation, which expands the dimension from 256 to an inner-layer dimension of 512 before projecting it back to 256.