ITSC 2024 Paper Abstract

Paper WeBT5.2

Zhong, Jiaru (Beijing Institute of Technology), Yu, Haibao (The University ok Hong Kong), Zhu, Tianyi (Beijing Institute of Technology), Xu, Jiahui (Beijing Institute of Technology), Yang, Wenxian (Tsinghua University), Nie, Zaiqing (Tsinghua University), Sun, Chao (Beijing Institute of Technology)

Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception

Scheduled for presentation during the Invited Session "Driving the Edge: Addressing Corner Cases in Self-driving Vehicles" (WeBT5), Wednesday, September 25, 2024, 14:50−15:10, Salon 13

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on November 2, 2025

Keywords Sensing, Vision, and Perception, Cooperative Techniques and Systems, Communications and Protocols in ITS

Abstract

Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. However, cooperative perception still faces numerous challenges, including limited communication bandwidth and practical communication interruptions. In this paper, we propose CTCE, a novel framework for cooperative 3D object detection. This framework transmits queries with temporal contexts enhancement, effectively balancing transmission efficiency and performance to accommodate real-world communication conditions. Additionally, we propose a temporal-guided fusion module to further improve performance. The roadside temporal enhancement and vehicle-side spatial-temporal fusion together constitute a multi-level temporal contexts integration mechanism, fully leveraging temporal information to enhance performance. Furthermore, a motion-aware reconstruction module is introduced to recover lost roadside queries due to communication interruptions. Experimental results on V2X-Seq and V2X-Sim datasets demonstrate that CTCE outperforms the baseline QUEST, achieving improvements of $3.8%$ and $1.3%$ in mAP, respectively. Experiments under communication interruption conditions validate CTCE's robustness to communication interruptions.