SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control
Authors: Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations confirm Subject Drive s efficacy in generating scalable autonomous driving training data, marking a significant step toward revolutionizing data production methods in this field. Extensive experiments on the nu Scenes dataset (Caesar et al. 2020) validate the effectiveness of our proposed method. Table 1: Evaluation of data scaling on detection and tracking tasks. Table 5: Ablation studies of different modules in Subject Drive, with the last row showing the alignment performance on the real validation data. |
| Researcher Affiliation | Collaboration | Binyuan Huang1*, Yuqing Wen2*, Yucheng Zhao3*, Yaosi Hu4*, Yingfei Liu3, Fan Jia3, Weixin Mao3, Tiancai Wang3 , Chi Zhang5, Chang Wen Chen4, Zhenzhong Chen1, Xiangyu Zhang3 1Wuhan University 2University of Science and Technology of China 3MEGVII Technology 4The Hong Kong Polytechnic University 5Mach Drive |
| Pseudocode | No | The paper describes the methodology in narrative text and uses diagrams (Figure 3, 4, 5, 6) to illustrate the architecture and components, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository. |
| Open Datasets | Yes | Extensive experiments on the nu Scenes dataset (Caesar et al. 2020) validate the effectiveness of our proposed method. We use the nu Scenes dataset to train Subject Drive and assess the visual fidelity and controllability of the generated data. The external subject bank is established by integrating external vehicle datasets from the open-source Comp Cars (Yang et al. 2015) dataset. |
| Dataset Splits | Yes | We use the nu Scenes dataset to train Subject Drive and assess the visual fidelity and controllability of the generated data. We generated the validation set of nu Scenes without applying any pre-processing or post-processing to the selected samples. The internal subject bank is curated by collecting subjects from the training set of the nu Scenes dataset. |
| Hardware Specification | Yes | Experiments are conducted on 8A100 GPUs using the DDIM sampler with 25 steps to produce 256 × 512 resolution video clips spanning 8 frames. |
| Software Dependencies | No | The paper mentions several models and frameworks like Panacea, CLIP, Control Net, Latent Diffusion Models, and Stream PETR with ResNet50, and the DDIM sampler, but does not provide specific version numbers for any software libraries or dependencies used for implementation. |
| Experiment Setup | Yes | Subject Drive adopts a two-stage video generation approach: image generation in the first stage (optimized for 56k steps) and video generation in the second (84k steps). Experiments are conducted on 8A100 GPUs using the DDIM sampler with 25 steps to produce 256 × 512 resolution video clips spanning 8 frames. The evaluation uses Stream PETR with a Res Net50 backbone (He et al. 2016), trained at 256 × 512 resolution. |