ITSC 2025 Paper Abstract

Paper TH-EA-T23.2

Liu, Zhichao (Southeast University), Wang, Ziwei (Southeast University), Geng, Keke (Southeast University), Cheng, Xiaolong (Southeast University), liang, jinhao (Southeast University), Yin, Guodong (Southeast University), Ma, Tianxiao (Southeast University), Sun, Ye (Southeast University)

AlignOcc: Alignment-Aware LiDAR-Camera Fusion for 3D Occupancy Prediction in Autonomous Driving

Scheduled for presentation during the Invited Session "S23b-Trustworthy AI for Traffic Sensing and Control" (TH-EA-T23), Thursday, November 20, 2025, 13:50−14:10, Coolangata 2

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords Advanced Sensor Fusion for Robust Autonomous Vehicle Perception, Deep Learning for Scene Understanding and Semantic Segmentation in Autonomous Vehicles, Lidar-based Mapping and Environmental Perception for ITS Applications

Abstract

Comprehensive modeling of real-world autonomous driving scenarios is critical for intelligent transportation systems. Multi-modal fusion-based 3D occupancy prediction methods effectively address the limitations of conventional 2D object detection task in perceiving irregularly shaped obstacles and unknown object categories. However, most existing methods in sufficiently leverage the rich semantic and geometric information embedded in raw data. Moreover, current multi-sensor fusion approaches often neglect the inherent misalignment between LiDAR and camera modalities, thereby compromising perception accuracy. This paper introduces AlignOcc as a novel LiDAR Camera fusion-based framework for 3D occupancy prediction. It achieves tightly coupled multi-modal representation via a geometry-semantic framework, promoting fine-grained fusion of structural and semantic information. In addition, we design an Alignment-Aware Fusion Module that performs global alignmentbetween the two modalities via bidirectional dynamic offsets. Extensive experiments on the nuScenes dataset demonstrate that our method achieves superior performance.