ITSC 2024 Paper Abstract

Paper WeBT3.5

Yang, Fukun (Beijing University Of Posts and Telecommunications), Hu, Zhiqun (Beijing University of Posts and Telecommunications), Zhang, Yuanming (Beijing University of Posts and Telecommunications), Huang, Hao (Beijing university of posts and telecommunications), wang, guixin (Beijing University of Posts and Telecommunications)

Refine Reinforcement Learning for Safety Training of Autonomous Driving

Scheduled for presentation during the Invited Session "AI-Enhanced Safety-Certifiable Autonomous Vehicles" (WeBT3), Wednesday, September 25, 2024, 15:50−16:10, Salon 6

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on April 25, 2025

Keywords Automated Vehicle Operation, Motion Planning, Navigation, Advanced Vehicle Safety Systems, Driver Assistance Systems

Abstract

The random exploration nature of reinforcement learning (RL) impedes the way to achieve human-like autonomous driving, owing to the prohibitively high safety requirement. In this paper, we propose a deep refine reinforcement learning (DR2L) approach to removing non-safety-critical actions and reconstructing critical ones, which effectively enhance the efficiency of exploration. The core is to design an action filter based on two-stage vehicle motion model to calculate the critical value of dangerous actions and reconstructe the action space by filtering out the obvious incorrect actions. Besides, we propose to use the beta distribution as the stochastic policy, which eliminates the bias of the Gaussian policy and provides faster convergence. Finally, we design spacial-temporal attention network to extract hidden information of environment as the state to enhance the performance of RL. The simulation shows that DR2L can effectively improve the safety of agent during training process. Our resutls show that the beta policy provides significantly faster convergence over the Gaussian policy when both are used with proximal policy optimization (PPO).