ITSC 2025 Paper Abstract

Paper WE-LA-T9.3

Hu, Yulong (The Hong Kong University of Science and Technology), Du, Yali (Kings' College London), Li, Sen (The Hong Kong University of Science and Technology)

Multi-Stage Multi-Agent Reinforcement Learning for On-Demand Food-Delivery Services with a Mixed Fleet of Human Couriers and Drones

Scheduled for presentation during the Regular Session "S09c-Optimization for Multimodal and On-Demand Urban Mobility Systems" (WE-LA-T9), Wednesday, November 19, 2025, 16:40−17:00, Coolangata 3

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 19, 2025

Keywords Multimodal Transportation Networks for Efficient Urban Mobility, Data Analytics and Real-time Decision Making for Autonomous Traffic Management

Abstract

This work studies a novel collaborative multimodal delivery system for urban food logistics, combining the complementary strengths of human couriers and unmanned aerial vehicles (UAVs). The system features a two-stage delivery process: human couriers execute the initial collection and transportation of orders to strategically located launchpads, while drones subsequently complete the final delivery segments to kiosks. To unleash the potentials of this promising hybrid system, we develop a Multi-Stage Multi-Agent Reinforcement Learning (MS-MARL) framework that synergistically combines decentralized decision-making through a Multi-Agent Markov Decision Process (MAMDP) with centralized coordination via dynamic bipartite matching. Courier and drone agents learn anticipatory drop-off and relocation policies in the MAMDP, while the matching mechanism periodically optimizes agent-task assignments using the learned value functions from MAMDP. To enable effective multi-agent coordination learning, we develop a progressive multi-stage curriculum training strategy consisting of two key phases: an initial pretraining stage where agents learn independently under simplified collaborative assumptions, followed by a joint fine-tuning phase that optimizes the complete system through realistic interaction dynamics. Comprehensive experiment results based on Hong Kong food delivery data validate the effectiveness of our MS-MARL framework.