ITSC 2025 Paper Abstract

Paper VP-VP.20

Liu, Pei (The Hong Kong University of Science and Technology(GuangZhou)), ZHANG, Zihao (Southeast University), LIU, Haipeng (Shanghai Li Auto Co., Ltd.), li, yiqin (southeast university), Chen, Junlan (Monash University)

Roadside Monocular 3D Detection via Cross-View Semantic Alignment

Scheduled for presentation during the Video Session "On-Demand Video Presentations" (VP-VP), Saturday, November 22, 2025, 08:00−18:00, On-Demand Platform

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on April 2, 2026

Keywords Real-time Object Detection and Tracking for Dynamic Traffic Environments, IoT-based Traffic Sensors and Real-time Data Processing Systems

Abstract

Roadside perception systems present unique advantages for autonomous driving through elevated sensor placement, offering expanded field-of-view and reduced occlusion compared to vehicle-mounted counterparts. However, monocular 3D detection in this paradigm faces understudied challenges: (i) the diversity of roadside camera configurations (varying focal lengths, pitch angles) disrupts spatial consistency between 2D observations and 3D world coordinates, and (ii) perspective distortions from oblique viewing angles amplify depth ambiguity for distant traffic participants. To address these issues, we propose a robust vision-based framework that establishes geometry-aware feature representation through two key innovations: a Deformable Height-Context Alignment module that adaptively fuses multi-scale visual cues with elevation priors using learnable spatial offsets, while a Global Voxel Transformer models long-range dependencies in bird's-eye-view space to mitigate projective ambiguity. Extensive experiments were conducted using the Rope3D and DAIR-V2X-I datasets, and the results demonstrated the outperformance of the proposed algorithm in the detection of both vehicles and cyclists. These results indicate that the algorithm is robust and generalized under various detection scenarios. Improving the accuracy of 3D object detection on the roadside is conducive to building a safe and trustworthy intelligent transportation system of vehicle-infrastructure coordination and promoting the large-scale application of autonomous driving.