ITSC 2025 Paper Abstract

Paper VP-VP.103

Wang, Dejin (Northeastern university), Ghoreishi, Seyede Fatemeh (Northeastern University)

RGDR: Reward-Guided Domain Randomization for Autonomous Driving

Scheduled for presentation during the Video Session "On-Demand Video Presentations" (VP-VP), Saturday, November 22, 2025, 08:00−18:00, On-Demand Platform

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on April 2, 2026

Keywords Autonomous Vehicle Safety and Performance Testing, Real-time Motion Planning and Control for Autonomous Vehicles in ITS Networks

Abstract

Autonomous vehicles must reliably operate in highly dynamic and unpredictable environments, which often diverge significantly from their training conditions. Domain randomization (DR) is a widely adopted approach to enhance generalization and robustness, particularly in zero-shot scenarios where the target deployment domain is unknown during training. However, conventional DR methods typically rely on uniform random sampling of environmental parameters, leading to imbalanced training that overfits to simpler scenarios and underperforms in challenging, safety-critical situations. To address these limitations, we propose a novel Reward-Guided Domain Randomization (RGDR) framework that adaptively prioritizes informative and difficult environments based on real-time reward feedback from the reinforcement learning policy. Specifically, RGDR employs weighted kernel density estimation to assign higher sampling probabilities to environments yielding lower policy rewards, thereby effectively focusing training on regions critical for robustness. Importantly, the proposed method maintains linear or sublinear computational complexity, ensuring scalability to complex, high-dimensional, real-world tasks. Extensive experiments conducted on the OpenAI Gym and CARLA driving benchmarks validate the proposed RGDR framework, demonstrating superior policy robustness and significantly improved generalization to out-of-distribution scenarios compared to both uniform and active domain randomization baselines.