ITSC 2024 Paper Abstract

Paper WeBT2.5

Fu, Yongjie (Columbia University), Li, Yunlong (Columbia University), Di, Xuan (Columbia University)

GenDDS: Generating Diverse Driving Video Scenarios with Prompt-To-Video Generative Model

Scheduled for presentation during the Regular Session "Sensing, Vision, and Perception II" (WeBT2), Wednesday, September 25, 2024, 15:50−16:10, Salon 5

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on April 25, 2025

Keywords Other Theories, Applications, and Technologies, Sensing, Vision, and Perception

Abstract

Autonomous driving training requires a diverse range of datasets encompassing various traffic conditions, weather scenarios, and road types. Traditional data augmentation methods often struggle to generate datasets that represent rare occurrences. To address this challenge, we propose GenDDS, a novel approach for generating driving scenarios generation by leveraging the capabilities of Stable Diffusion XL (SDXL), an advanced latent diffusion model. Our methodology involves the use of descriptive prompts to guide the synthesis process, aimed at producing realistic and diverse driving scenarios. With the power of the latest computer vision techniques, such as ControlNet and Hotshot-XL, we have built a complete pipeline for video generation together with SDXL. We employ the KITTI dataset, which includes real-world driving videos, to train the model. Through a series of experiments, we demonstrate that our model can generate high-quality driving videos that closely replicate the complexity and variability of real-world driving scenarios. This research contributes to the development of sophisticated training data for autonomous driving systems and opens new avenues for creating virtual environments for simulation and validation purposes.