reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EMMA: End-to-End Multimodal Model for Autonomous Driving

Authors: Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, Yin Zhou, James Guo, Dragomir Anguelov, Mingxing Tan

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate EMMA s eﬀectiveness by achieving state-of-the-art performance in motion planning on nu Scenes as well as competitive results on the Waymo Open Motion Dataset (WOMD). EMMA also yields competitive results for camera-primary 3D object detection on the Waymo Open Dataset (WOD). We show that co-training EMMA with planner trajectories, object detection, and road graph tasks yields improvements across all three domains, highlighting EMMA s potential as a generalist model for autonomous driving applications.
Researcher Affiliation	Industry	Contact emails: Mingxing Tan <EMAIL>, Jyh-Jing Hwang <EMAIL>.
Pseudocode	No	The paper describes its methodology using descriptive text and mathematical equations (e.g., O = G(T, V)), but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	No	The paper mentions using an "open-sourced MLLM, Pa LI-X (Chen et al., 2024d)" for experiments, which refers to a third-party tool. However, it does not contain any explicit statements or links indicating that the authors' own implementation code for EMMA is open-source or publicly available.
Open Datasets	Yes	Overall, we leverage three public datasets, nu Scenes (Caesar et al., 2020), Waymo Open Motion Dataset (WOMD) (Chen et al., 2024a) and Waymo Open Dataset (WOD) (Sun et al., 2020).
Dataset Splits	No	The paper describes how individual data samples are structured (e.g., for WOMD, "1 second is used as input context, and the remaining 8 seconds serve as the prediction target"; for nu Scenes, "predict the next 3 seconds of future driving actions based on 2 seconds of historical data") or refers to "standard protocol" for public benchmarks. However, it does not explicitly provide the train/test/validation split ratios or sample counts for the overall datasets used to train the model, nor does it specify custom split files.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or other computing resource specifications used for running its experiments.
Software Dependencies	No	The paper mentions models like "Gemini 1.0 Nano-1" and "Pa LI-X" but does not list any specific software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages with their version numbers that would be necessary to replicate the experimental setup.
Experiment Setup	No	The paper describes some aspects of the training strategy, such as batch sampling for generalist training and Top-K decoding for inference. However, it does not provide specific hyperparameters like learning rates, exact batch sizes, number of epochs for main model training, or optimizer details required to reproduce the experimental setup.