ITSC 2025 Paper Abstract

Paper TH-EA-T25.4

Li, Yaqing (Beijing Institute of Technology), Li, Xinke (Beijing Institute of Technology), Fu, Mengyin (Beijing Institute of Technology), Yang, Yi (Beijing Institute of Technology), Zhang, Ting (Beijing Institute of Technology)

ROP-DARL: Risk-aware Optimistic-Pessimistic Dual-Actor Reinforcement Learning for Safe Decision-making of Autonomous Vehicles

Scheduled for presentation during the Regular Session "S25b-Cooperative and Connected Autonomous Systems" (TH-EA-T25), Thursday, November 20, 2025, 14:30−14:50, Cooleangata 4

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords Cooperative Driving Systems and Vehicle Coordination in Multi-vehicle Scenarios, Real-time Motion Planning and Control for Autonomous Vehicles in ITS Networks, Autonomous Vehicle Safety and Performance Testing

Abstract

Reinforcement learning algorithms are widely applied to autonomous driving decision-making in complex interactive environments. However, ensuring safety remains a significant challenge. Although safe reinforcement learning methods have been proposed, they still struggle to balance between security and efficiency when making decisions. To address these challenges, this work proposes a Risk-aware Optimistic-Pessimistic Dual-Actor Reinforcement Learning (ROP-DARL) approach, which enhances the safety performance of the model from three aspects. First, we introduce a trajectory prediction model for scenario understanding and rank the predicted trajectories based on the risk field theory. Second, hybrid strategies are generated by the proposed dual policies to dynamically balance the efficiency and safety of decision-making. Specifically, the optimistic actor fully utilizes prediction information to learn efficient strategies, while the pessimistic actor only considers high-risk predictions to generate cautious strategies. Finally, we employ the action mask method and explore its functioning pattern regarding the model's safety performance, which further verifies the robustness of the proposed model. Experiments show that in three interactive traffic scenarios, the proposed model achieves higher success rates and better safety guarantees even with diminished action masking.