ITSC 2025 Paper Abstract

Paper TH-LM-T20.5

SHI, Qingyuan (Tsinghua University), Qingwen, Meng (Tsinghua university), Cheng, Hao (Tsinghua University), Xu, Qing (Tsinghua University), Wang, Jianqiang (Tsinghua University)

LinguaSim: Interactive Multi-Vehicle Testing Scenario Generation Via Natural Language Instruction Based on Large Language Models

Scheduled for presentation during the Invited Session "S20a-Foundation Model-Enabled Scene Understanding, Reasoning, and Decision-Making for Autonomous Driving and ITS" (TH-LM-T20), Thursday, November 20, 2025, 11:50−12:10, Surfers Paradise 2

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords Autonomous Vehicle Safety and Performance Testing

Abstract

The generation of testing and training scenarios for autonomous vehicles has drawn significant attention. While Large Language Models (LLM) have enabled new scenario generation methods, current methods struggle to balance command adherence accuracy with the realism of real-world driving environments. To reduce scenario description complexity, these methods often compromise realism by limiting scenarios to 2D, or open-loop simulations where background vehicles follow predefined, non-interactive behaviors. We propose LinguaSim, an LLM-based framework that converts natural language into realistic, interactive 3D scenarios, ensuring both dynamic vehicle interactions and faithful alignment between the input descriptions and the generated scenarios. A feedback calibration module further refines the generation precision, improving fidelity to user intent. By bridging the gap between natural language and closed-loop, interactive simulations, LinguaSim constrains adversarial vehicle behaviors using both the scenario description and the autonomous driving model guiding them. This framework facilitates the creation of high-fidelity scenarios that enhance the safety testing and training. Experiments show LinguaSim can generate scenarios with varying criticality aligned with different natural language descriptions (ACT: 0.072 s for dangerous vs. 3.532 s for safe descriptions; comfortability: 0.654 vs. 0.764), and its refinement module effectively reduces excessive aggressiveness in LinguaSim’s initial outputs, lowering the crash rate from 46.9% to 6.3% to better match user intentions.