ITSC 2025 Paper Abstract

Paper VP-VP.70

Chen, Liming (Xi'an Jiaotong University), Guo, Zhongyu (Xi'an Jiaotong University), Wang, Ningnan (Xi'an Jiaotong University), ji, haoxuan (Xi'an Jiaotong University), Chen, Weihuang (Xi'an Jiaotong University), Wang, Yitian (Xi'an Institute of Applied Optics), Sun, Hongbin (Xi’an Jiaotong University)

RSG-VLN: Relevance Semantic Map Guided Vision-Language Navigation

Scheduled for presentation during the Video Session "On-Demand Video Presentations" (VP-VP), Saturday, November 22, 2025, 08:00−18:00, On-Demand Platform

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on April 2, 2026

Keywords Deep Learning for Scene Understanding and Semantic Segmentation in Autonomous Vehicles

Abstract

Vision-Language Navigation (VLN) empowers autonomous agents to navigate unseen environments through natural language instructions and visual observations. Despite advancements driven by foundational models, critical challenges persist, including inefficient trajectories due to static semantic maps, error propagation in multi-step instruction decomposition, and computational bottlenecks in real-world deployment. This paper presents Relevance Semantic Map-Guided VLN (RSG-VLN), a new framework that addresses these limitations by dynamically aligning semantic scene understanding with task objectives. Specially, RSG-VLN introduces relevance semantic maps, leveraging Pointwise Mutual Information to quantify contextual associations between objects and task goals, enabling agents to prioritize actions with high semantic-task relevance. Additionally, large language models is employed to decompose long-horizon instructions into temporally sequenced subtasks, each mapped to localized scene targets. A hybrid navigation strategy integrates global path planning—connecting subtask waypoints—with local obstacle avoidance using classical exploration methods, ensuring robustness in complex layouts. Extensive experiments on Habitat-Matterport 3D and Matterport 3D datasets demonstrate that RSG-VLN achieves state-of-the-art performance in various VLN tasks. These advancements underscore the potential of RSG-VLN as a scalable solution for real-world applications that require precise, context-aware navigation.