ITSC 2024 Paper Abstract

Paper WeBT13.12

Zhang, Yangjing (Xi'an Jiaotong university), Shen, Yanqing (Xi'an Jiaotong University), Zhu, Ziyu (Xi'an Jiaotong University), Hai, Renwei (Xi'an Jiaotong University), Chen, Shitao (Xi'an Jiaotong University, Xi'an, China), Zheng, Nanning (Xi'an Jiaotong University)

EFormer-VPR: Fusing Events and Frames with Transformer for Visual Place Recognition

Scheduled for presentation during the Poster Session "Transformer networks" (WeBT13), Wednesday, September 25, 2024, 14:30−16:30, Foyer

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on December 26, 2024

Keywords Multi-modal ITS, Network Modeling, Sensing, Vision, and Perception

Abstract

Visual place recognition (VPR) is a challenging task faced by mobile robots and autonomous driving systems. In scenarios with glare or high-speed motion, image blurring makes it difficult for traditional cameras to perform reliable and accurate place recognition. In contrast, event cameras can capture target motion information without blur in high-speed motion scenes, but lack texture information in low-speed motion scenes. To leverage the complementary characteristics of these two sensors and improve the performance and robustness of VPR algorithms, we propose EFormer-VPR, which fuses target motion events and frames with transformer. This method firstly preprocesses the events stream within an adaptive time window using a clustering method, then uses transformer-based networks to extract features from motion frames and events separately and fuse them through a scoring module. Finally, features are aggregated using a VLAD layer and the whole pipeline is supervised by a triplet ranking loss. To verify the effectiveness of the proposed algorithm, we compare it with other VPR methods on the event-based driving datasets (Brisbane-Event-VPR, NeuroGPR) with challenging scenarios. Experimental results show that on the Brisbane-Event-VPR and NeuroGPR datasets, our method achieves state-of-the-art performance.