EMMA: End-to-End Multimodal Model for Autonomous Driving
Authors: Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, Yin Zhou, James Guo, Dragomir Anguelov, Mingxing Tan
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate EMMA s effectiveness by achieving state-of-the-art performance in motion planning on nu Scenes as well as competitive results on the Waymo Open Motion Dataset (WOMD). EMMA also yields competitive results for camera-primary 3D object detection on the Waymo Open Dataset (WOD). We show that co-training EMMA with planner trajectories, object detection, and road graph tasks yields improvements across all three domains, highlighting EMMA s potential as a generalist model for autonomous driving applications. |
| Researcher Affiliation | Industry | Contact emails: Mingxing Tan <EMAIL>, Jyh-Jing Hwang <EMAIL>. |
| Pseudocode | No | The paper describes its methodology using descriptive text and mathematical equations (e.g., O = G(T, V)), but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | No | The paper mentions using an "open-sourced MLLM, Pa LI-X (Chen et al., 2024d)" for experiments, which refers to a third-party tool. However, it does not contain any explicit statements or links indicating that the authors' own implementation code for EMMA is open-source or publicly available. |
| Open Datasets | Yes | Overall, we leverage three public datasets, nu Scenes (Caesar et al., 2020), Waymo Open Motion Dataset (WOMD) (Chen et al., 2024a) and Waymo Open Dataset (WOD) (Sun et al., 2020). |
| Dataset Splits | No | The paper describes how individual data samples are structured (e.g., for WOMD, "1 second is used as input context, and the remaining 8 seconds serve as the prediction target"; for nu Scenes, "predict the next 3 seconds of future driving actions based on 2 seconds of historical data") or refers to "standard protocol" for public benchmarks. However, it does not explicitly provide the train/test/validation split ratios or sample counts for the overall datasets used to train the model, nor does it specify custom split files. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or other computing resource specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions models like "Gemini 1.0 Nano-1" and "Pa LI-X" but does not list any specific software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages with their version numbers that would be necessary to replicate the experimental setup. |
| Experiment Setup | No | The paper describes some aspects of the training strategy, such as batch sampling for generalist training and Top-K decoding for inference. However, it does not provide specific hyperparameters like learning rates, exact batch sizes, number of epochs for main model training, or optimizer details required to reproduce the experimental setup. |