ITSC 2024 Paper Abstract

Paper ThAT6.1

Yin, Jianwen (University of Chinese Academy of Sciences), Jiang, Zhengmin (University of Chinese Academy of Sciences), Liang, Qingyi (Shenzhen Institutes of Advanced Technology, Chinese Academy of S), Peng, Lei (Shenzhen Institute of Advanced Technology,Chinese Academy of Sci), Zhu, Fenghua (Institute of Automation, Chinese academy of sciences), Liu, Jia (Shenzhen Institute of Advanced Technology Chinese Academy of Sci), Li, HuiYun (Shenzhen Institute of Advanced Technology)

Heterogeneous Information Fusion-Based Distributional Reinforcement Learning for Autonomous Driving

Scheduled for presentation during the Regular Session "Driving based on reinforcement learning" (ThAT6), Thursday, September 26, 2024, 10:30−10:50, Salon 14

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on April 3, 2025

Keywords Automated Vehicle Operation, Motion Planning, Navigation

Abstract

Autonomous driving has made significant strides in recent years, with reinforcement learning emerging as a promising approach for developing impressive driving policies in urban traffic scenarios. Nevertheless, one challenge encountered when utilizing reinforcement learning is the issue of suboptimal and unstable driving policy due to the Q-value overestimation. To address this challenge, we propose a novel distributional reinforcement learning method that incorporates implicit quantiles into the actor-critic framework, thereby enabling a more accurate estimation of Q-values. Another issue is the inefficiency of sample learning. To enhance the representation learning of urban traffic scenarios and improve sample efficiency, we introduce a temporal-wise attention-based model that effectively aggregates heterogeneous types of state information. Through extensive experiments, our approach demonstrates superior performance when compared to the baselines on the NoCrash and CoRL benchmarks. The results show that our proposed method not only learns improved policies but also surpasses the baselines in dense traffic scenarios, as well as obtains comparative performance in other traffic scenarios.