ITSC 2025 Paper Abstract

Paper TH-LM-T23.2

Zhang, Meng (Zhejiang University), wang, dianhai (Zhejiang University), Yu, Hongxin (Zhejiang University), Wang, Xu (Zhejiang University), Zhang, Jiahao (The university of queenslands), Cai, Zhengyi (Zhejiang University)

A Subarea-Based Multi Agent Reinforcement Learning Approach for Traffic Signal Control

Scheduled for presentation during the Invited Session "S23a-Trustworthy AI for Traffic Sensing and Control" (TH-LM-T23), Thursday, November 20, 2025, 10:50−11:10, Coolangata 2

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords AI, Machine Learning for Dynamic Traffic Signal Control and Optimization, Smart Roadway Networks with IoT-enabled Sensors and Real-time Data Analytics

Abstract

Multi agent reinforcement learning based on centralized training with decentralized execution has emerged as an efficient approach for optimizing multi-intersection signal control. However, existing methods often overlook the correlations between intersections, treating all agent relationships equally during global optimization, which limits efficiency and performance. To address this, this paper proposes a subarea-based multi-agent signal control method, comprising two key modules: (1) a subarea partitioning module and (2) a hierarchical optimization module. The subarea partitioning module calculates intersection correlation degrees and identifies potential community structures using community detection algorithms. The hierarchical optimization module decomposes the global value function into subarea value functions and local agent value functions through a hierarchical decomposition framework, which accelerates global value function learning. Furthermore, we design an encoder-decoder to guide agents’ decision-making based on subarea-level context, and introduce a more compact input to speed up the fitting process of the global value function. Numerical experiments demonstrate that our method achieves a 7% improvement in key metrics compared to baseline RL methods, and outperforms actuated control methods by over 40%.