ITSC 2025 Paper Abstract

Paper VP-VP.5

Chen, Zhongzhang (NavInfo Co.,Ltd.), Fan, Miao (Beijing Institute of Graphic Communication), Xu, Shengtong (Autohome Inc.), Yang, Mengmeng (Tsinghua University), Jiang, Kun (Tsinghua University), Liu, Xiangzeng (Xidian University), Xiong, Haoyi (Baidu Inc)

Multimodal HD Mapping for Intersections by Intelligent Roadside Units

Scheduled for presentation during the Video Session "On-Demand Video Presentations" (VP-VP), Saturday, November 22, 2025, 08:00−18:00, On-Demand Platform

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on April 2, 2026

Keywords Lidar-based Mapping and Environmental Perception for ITS Applications, Infrastructure Requirements for Connected and Automated Vehicles

Abstract

High-definition (HD) semantic mapping of complex intersections poses significant challenges for traditional vehicle-based approaches due to occlusions and limited perspectives. This paper introduces a novel camera-LiDAR fusion framework that leverages elevated intelligent roadside units (IRUs). Additionally, we present RS-seq, a comprehensive dataset developed through the systematic enhancement and annotation of the V2X-Seq dataset. RS-seq includes precisely labelled camera imagery and LiDAR point clouds collected from roadside installations, along with vectorized maps for seven intersections annotated with detailed features such as lane dividers, pedestrian crossings, and stop lines. This dataset facilitates the systematic investigation of cross-modal complementarity for HD map generation using IRU data. The proposed fusion framework employs a two-stage process that integrates modality-specific feature extraction and cross-modal semantic integration, capitalizing on camera high-resolution texture and precise geometric data from LiDAR. Quantitative evaluations using the RS-seq dataset demonstrate that our multimodal approach consistently surpasses unimodal methods. Specifically, compared to unimodal baselines evaluated on the RS-seq dataset, the multimodal approach improves the mean Intersection-over-Union (mIoU) for semantic segmentation by 4% over the image-only results and 18% over the point cloud-only results. This study establishes a baseline methodology for IRU-based HD semantic mapping and provides a valuable dataset for future research in infrastructure-assisted autonomous driving systems.