ITSC 2025 Paper Abstract

Paper VP-VP.82

Zhang, Xinyuan (University of Chinese Academy of Sciences), Tian, Yonglin (Institute of Automation, Chinese Academy of Sciences), Lin, Fei (Macau University of Science and Technology), liu, yue (State Key Laboratory of Multimodal Artificial Intelligence Syste), MA, JING (China Ship Research and Development Academy), WANG, XIAO (Anhui University), Szatmáry, Kornélia Sára (Obuda University), Wang, Fei-Yue (Institute of Automation, Chinese Academy of Sciences)

LogisticsVLN: Vision-Language Navigation for Low-Altitude Terminal Delivery Based on Agentic UAVs

Scheduled for presentation during the Video Session "On-Demand Video Presentations" (VP-VP), Saturday, November 22, 2025, 08:00−18:00, On-Demand Platform

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on April 2, 2026

Keywords Last-Mile Delivery Optimization with Autonomous Vehicles and Drones, Low Altitude Urban Mobility and Logistics

Abstract

The growing demand for intelligent logistics, particularly fine-grained terminal delivery, underscores the need for autonomous UAV (Unmanned Aerial Vehicle)-based delivery systems. However, most existing last-mile delivery studies rely on ground robots, while current UAV-based Vision-Language Navigation (VLN) tasks primarily focus on coarse-grained, long-range goals, making them unsuitable for precise terminal delivery. To bridge this gap, we propose LogisticsVLN, a scalable aerial delivery system built on multimodal large language models (MLLMs) for autonomous terminal delivery. LogisticsVLN integrates lightweight Large Language Models (LLMs) and Visual-Language Models (VLMs) in a modular pipeline for request understanding, floor localization, object detection, and action-decision making. To support research and evaluation in this new setting, we construct the Vision-Language Delivery (VLD) dataset within the CARLA simulator. Experimental results on the VLD dataset showcase the feasibility of the LogisticsVLN system. In addition, we conduct subtask-level evaluations of each module of our system, offering valuable insights for improving the robustness and real-world deployment of foundation model-based vision-language delivery systems.