reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DiffScene: Diffusion-Based Safety-Critical Scenario Generation for Autonomous Vehicles

Authors: Chejian Xu, Aleksandr Petiushko, Ding Zhao, Bo Li

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimentation has been conducted to validate the efﬁcacy of our approach. Compared with 6 SOTA baselines, Diff Scene generates scenarios that are (1) more safety-critical under different metrics, (2) more realistic under 5 distance functions, and (3) more transferable to different AV algorithms. In addition, we demonstrate that training AV algorithms with scenarios generated by Diff Scene leads to signiﬁcantly higher performance under safety-critical metrics.
Researcher Affiliation	Collaboration	1University of Illinois at Urbana-Champaign 2Gatik AI 3Carnegie Mellon University
Pseudocode	Yes	The detailed process of Diff Scene is shown in Algorithm 1 in Section 9.
Open Source Code	No	The paper mentions using Carla and GUAM simulators but does not provide any statement or link for the source code of the Diff Scene methodology itself.
Open Datasets	No	To train the diffusion model µω, we ﬁrst construct a benign trajectory dataset in Carla by training several RL models from scratch in benign scenarios, collecting a total of 6,995 trajectories.
Dataset Splits	Yes	For training the safety-critical objective model Jϑ, we generate 5,000 trajectories per scenario setting using the trained diffusion model, calculating J (ω) as the ground truth. Each scenario setting uses 4,000 trajectories for training and 1,000 for testing.
Hardware Specification	No	The paper states, "We use Carla (Dosovitskiy et al. 2017; Xu et al. 2022) as our simulator," but does not specify any hardware (GPU, CPU models, memory, etc.) used for running the experiments or training the models.
Software Dependencies	No	The paper mentions using "Carla (Dosovitskiy et al. 2017; Xu et al. 2022) as our simulator" and "RL algorithms: SAC, PPO (Schulman et al. 2017), and TD3 (Fujimoto, Hoof, and Meger 2018)" but does not provide specific version numbers for these software components or libraries.
Experiment Setup	Yes	For the scenarios generated by each generation algorithm, we use 80% of them as the training set. The remaining 20% scenarios from all algorithms together form a standard test set. We ﬁnetune the target SAC model in the different training sets using 3 different random seeds, each for 500 episodes, and report the averaged testing result on the standard test set.