ITSC 2025 Paper Abstract

Paper TH-LM-T28.3

Liu, Weimin (Tsinghua University), Wang, Wenjun (Tsinghua University), Meng, Joshua Huadong (UC Berkeley)

Enhancing Self-Supervised Surround-View Fisheye Depth Estimation with Road Geometry and Foreground Semantics

Scheduled for presentation during the Regular Session "S28a-Multi-Sensor Fusion and Perception for Robust Autonomous Driving" (TH-LM-T28), Thursday, November 20, 2025, 11:10−11:30, Stradbroke

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords Deep Learning for Scene Understanding and Semantic Segmentation in Autonomous Vehicles, Advanced Sensor Fusion for Robust Autonomous Vehicle Perception, Sensor Integration and Calibration for Accurate Localization in Dynamic Road Conditions

Abstract

Surround-view depth estimation is crucial for autonomous driving, enabling comprehensive 3D perception using camera-based systems. While recent self-supervised methods have reduced reliance on LiDAR by learning from multi-view pinhole images, these approaches overlook the fisheye-based surround systems widely adopted in passenger vehicles for near-field perception. Fisheye cameras introduce strong distortion and a different geometric interpretation of depth, posing significant challenges to existing frameworks. To address this gap, we propose SFDNet, a self-supervised depth estimation baseline for surround fisheye cameras. To alleviate data scarcity, data augmentation was conducted by changing the viewing direction of fisheye cameras. Furthermore, we propose a novel temporal warping approach that incorporates indirect ego-motion estimation within view synthesis. In addition, we leverage geometric and semantic cues from road surfaces and foreground structures to enhance depth prediction accuracy. Experiments on the calibrated SynWoodScape dataset validate the effectiveness of SFDNet across diverse viewing perspectives. The proposed method is expected to address limitations of LiDAR sensors such as self-shadowing and near-field blind spots, offering more complete and reliable depth perception in close-range urban scenarios.