SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

Authors: Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations confirm Subject Drive s efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field. Extensive experiments on the nu Scenes dataset (Caesar et al. 2020) validate the effectiveness of our proposed method. Table 1: Evaluation of data scaling on detection and tracking tasks. Table 5: Ablation studies of different modules in Subject Drive, with the last row showing the alignment performance on the real validation data.
Researcher Affiliation Collaboration Binyuan Huang1*, Yuqing Wen2*, Yucheng Zhao3*, Yaosi Hu4*, Yingfei Liu3, Fan Jia3, Weixin Mao3, Tiancai Wang3 , Chi Zhang5, Chang Wen Chen4, Zhenzhong Chen1, Xiangyu Zhang3 1Wuhan University 2University of Science and Technology of China 3MEGVII Technology 4The Hong Kong Polytechnic University 5Mach Drive
Pseudocode No The paper describes the methodology in narrative text and uses diagrams (Figure 3, 4, 5, 6) to illustrate the architecture and components, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository.
Open Datasets Yes Extensive experiments on the nu Scenes dataset (Caesar et al. 2020) validate the effectiveness of our proposed method. We use the nu Scenes dataset to train Subject Drive and assess the visual fidelity and controllability of the generated data. The external subject bank is established by integrating external vehicle datasets from the open-source Comp Cars (Yang et al. 2015) dataset.
Dataset Splits Yes We use the nu Scenes dataset to train Subject Drive and assess the visual fidelity and controllability of the generated data. We generated the validation set of nu Scenes without applying any pre-processing or post-processing to the selected samples. The internal subject bank is curated by collecting subjects from the training set of the nu Scenes dataset.
Hardware Specification Yes Experiments are conducted on 8A100 GPUs using the DDIM sampler with 25 steps to produce 256 × 512 resolution video clips spanning 8 frames.
Software Dependencies No The paper mentions several models and frameworks like Panacea, CLIP, Control Net, Latent Diffusion Models, and Stream PETR with ResNet50, and the DDIM sampler, but does not provide specific version numbers for any software libraries or dependencies used for implementation.
Experiment Setup Yes Subject Drive adopts a two-stage video generation approach: image generation in the first stage (optimized for 56k steps) and video generation in the second (84k steps). Experiments are conducted on 8A100 GPUs using the DDIM sampler with 25 steps to produce 256 × 512 resolution video clips spanning 8 frames. The evaluation uses Stream PETR with a Res Net50 backbone (He et al. 2016), trained at 256 × 512 resolution.