ITSC 2024 Paper Abstract

Paper WeBT5.3

Li, Jiusi (Tsinghua University), Wen, Tuopu (Tsinghua), Jiang, Kun (Tsinghua University), Miao, Jinyu (Tsinghua University), Shi, Yining (Tsinghua University), Zhao, Xuhe (Tsinghua University), Fan, Zhi-Gang (Zongmu Technologies), Yang, Diange (State Key Laboratory of Automotive Safety and Energy, Collaborat)

VC-Gaussian: Vision-Centric Gaussian Splatting for Dynamic Autonomous Driving Scenes

Scheduled for presentation during the Invited Session "Driving the Edge: Addressing Corner Cases in Self-driving Vehicles" (WeBT5), Wednesday, September 25, 2024, 15:10−15:30, Salon 13

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on April 25, 2025

Keywords Sensing, Vision, and Perception, Simulation and Modeling

Abstract

Reconstructing dynamic traffic scenes has a wide range of applications in the development of modern autonomous driving systems. Recently, novel-view synthesis techniques, such as Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3D GS), have emerged as promising paradigms for the reconstruction of 3D scenes. However, previous works in this area highly rely on LiDAR to provide accurate geometric prior and motion cues across frames to reconstruct dynamic objects, hindering the use of overwhelming vision data from mass-production vehicles. In this paper, we propose a novel Vision-Centric reconstruction framework based on 3D GS, VC-Gaussian, which allows high-quality novel-view synthesis and dynamic scene reconstruction for autonomous driving. A composite Gaussian model is designed to represent background and foreground objects separately. To facilitate the initialization and optimization process of Gaussians without LiDAR, we leverage easy-to-obtain monocular geometric prior including metric depth and normal. Experimental results on the real autonomous driving dataset demonstrate that our method outperforms other reconstruction methods with monocular vision inputs and even is competitive with LiDAR-based methods.