reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Ctrl-V: Higher Fidelity Autonomous Vehicle Video Generation with Bounding-Box Controlled Object Motion

Authors: Ge Ya Luo, ZhiHao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, Christopher Pal

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on the KITTI, Virtual-KITTI 2, BDD100k, and nu Scenes datasets validate the effectiveness of our approach in producing realistic and controllable video generation. For quantitative evaluation, we assess the model s performance across four driving datasets on three key aspects: 1. The overall visual quality of the generated results (Section 4.3) 2. The alignment of the predicted bounding box trajectories with the ground truth (Section 4.2) 3. The ﬁdelity of the generated objects in the video to the bounding box control signal (Section 4.4)
Researcher Affiliation	Collaboration	Ge Ya Luo EMAIL Mila, Université de Montréal; Zhi Hao Luo EMAIL Mila, Polytechnique Montréal; Anthony Gosselin EMAIL Mila, Polytechnique Montréal; Alexia Jolicoeur-Martineau EMAIL Samsung SAIT AI Lab, Montreal; Christopher Pal EMAIL Mila, Polytechnique Montréal Canada CIFAR AI Chair
Pseudocode	No	The paper describes the method using text and diagrams (Figure 1, Figure 2) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Project page: https://oooolga.github.io/ctrl-v.github.io/
Open Datasets	Yes	We evaluate the performance of our models across four autonomous-vehicle datasets: KITTI (Geiger et al., 2013), Virtual KITTI 2 (v KITTI) (Cabon et al., 2020), Berkeley Driving Dataset (BDD) (Yu et al., 2020) with Multi-object Tracking labels (MOT2020), and the nu Scenes Dataset (Caesar et al., 2019).
Dataset Splits	No	The paper states, 'To assess video quality, we randomly select 200 initial frames from each dataset s testing set and generate videos.' However, it does not explicitly provide the specific training, validation, and test splits used for the models, nor does it refer to predefined splits with citations for reproducibility of the data partitioning.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Stable Video Di!usion (SVD) models,' 'Control Net,' and 'YOLOv8 (Reis et al., 2024),' but it does not specify any version numbers for these software components or other ancillary software dependencies like programming languages or deep learning frameworks.
Experiment Setup	No	The paper describes the model architecture and general training strategy, such as using the Euler discrete noise scheduling method and freezing SVD weights during Control Net training. However, it does not provide specific numerical hyperparameters like learning rates, batch sizes, number of epochs, or optimizer configurations in the main text.