ITSC 2024 Paper Abstract

Paper FrBT5.3

Li, Yuxin (Nanyang Technological University), Li, Yiheng (Nanyang Technological University), Yang, Xulei (Institute for Infocomm Research (I2R), Agency for Science, Techn), Yu, Mengying (Desay SV Automotive), Huang, Zihang (Desay SV Automotive), Wu, Xiaojun (Desay SV Automotive), Yeo, Chai Kiat (Nanyang Technological University)

Learning Content-Aware Multi-Modal Joint Input Pruning Via Birds’-Eye-View Representation

Scheduled for presentation during the Regular Session "Sensing, Vision, and Perception VI" (FrBT5), Friday, September 27, 2024, 14:10−14:30, Salon 13

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on December 26, 2024

Keywords Sensing, Vision, and Perception, Data Mining and Data Analysis

Abstract

In the landscape of autonomous driving, Bird’sEye-View (BEV) representation has recently garnered substantial attention, serving as a transformative framework for the fusion of multi-modal sensor inputs. The BEV paradigm effectively shifts the sensor fusion challenge from a rulebased methodology to a data-centric approach, thereby facilitating more nuanced feature extraction from an array of heterogeneous sensors. Notwithstanding its evident merits, the computational overhead associated with BEV-based techniques often mandates high-capacity hardware infrastructure, thus posing challenges for practical, real-world implementations. To mitigate this limitation, we introduce a novel content aware multi-modal joint input pruning technique. Our method leverages BEV as a shared anchor to algorithmically identify and eliminate non-essential sensor regions prior to their introduction into the perception model’s backbone. We validate the efficacy of our approach through extensive experiments on the NuScenes dataset, demonstrating substantial computational efficiency without sacrificing perception accuracy. To the best of our knowledge, this work represents the first attempt to alleviate the computational burden from the input pruning point.