ITSC 2025 Paper Abstract

Paper TH-EA-T23.4

Gu, Hankang (Xi'an Jiaotong-Liverpool University), Zhang, Yuli (Xi'an Jiaotong-Liverpool University), Wang, Chengming (Xi'an Jiaotong-Liverpool University), Jiang, Ruiyuan (Xi'an Jiaotong-Liverpool University), Qiao, Ziheng (Xi'an Jiaotong-Liverpool University), Fan, Pengfei (Xi'an Jiaotong-Liverpool University), Jia, Dongyao (Xi'an Jiaotong-Liverpool University)

A Hierarchical Deep Reinforcement Learning Framework for Traffic Signal Control with Predictable Cycle Planning

Scheduled for presentation during the Invited Session "S23b-Trustworthy AI for Traffic Sensing and Control" (TH-EA-T23), Thursday, November 20, 2025, 14:30−14:50, Coolangata 2

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords AI, Machine Learning for Dynamic Traffic Signal Control and Optimization

Abstract

Deep reinforcement learning (DRL) has become a popular approach in traffic signal control (TSC) due to its ability to learn adaptive policies from complex traffic environments. Within DRL-based TSC, two primary control paradigms are ``choose phase" and ``switch" strategies. Although the agent in the choose phase paradigm selects the next active phase adaptively, this paradigm may result in unexpected phase sequences for drivers, disrupting their anticipation and potentially compromising safety at intersections. Meanwhile, the switch paradigm allows the agent to decide whether to switch to the next predefined phase or extend the current phase. While this structure maintains a more predictable order, it can lead to unfair and inefficient phase allocations, as certain movements may be extended disproportionately while others are neglected. In this paper, we propose a DRL model, named Deep Hierarchical Cycle Planner (DHCP), to allocate the traffic signal cycle duration hierarchically. A high-level agent first determines the split of the total cycle time between the North-South (NS) and East-West (EW) directions based on the overall traffic state. Then, a low-level agent further divides the allocated duration within each major direction between straight and left-turn movements, enabling more flexible durations for the two movements. We test our model on both real and synthetic road networks, along with multiple sets of real and synthetic traffic flows. Empirical results show our model achieves the best performance over all datasets against baselines.