ITSC 2025 Paper Abstract

Paper TH-LA-T18.2

Gao, Yuan (Technical University of Munich), Piccinini, Mattia (Technical University of Munich), Moller, Korbinian (Technical University of Munich), Alanwar, Amr (KTH), Betz, Johannes (Technical University of Munich)

From Words to Collisions: LLM-Guided Evaluation and Adversarial Generation of Safety-Critical Driving Scenarios

Scheduled for presentation during the Invited Session "S18c-Innovative Applications of LLM in Multimodal Transportation Systems" (TH-LA-T18), Thursday, November 20, 2025, 16:20−16:40, Southport 3

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords Autonomous Vehicle Safety and Performance Testing, Safety Verification and Validation Methods for Autonomous Vehicle Technologies

Abstract

Ensuring the safety of autonomous vehicles requires virtual scenario-based testing, which depends on the robust evaluation and generation of safety-critical scenarios. So far, researchers have used scenario-based testing frameworks that rely heavily on handcrafted scenarios as safety metrics. To reduce the effort of human interpretation and overcome the limited scalability of these approaches, we combine Large Language Models (LLMs) with structured scenario parsing and prompt engineering to automatically evaluate and generate safety-critical driving scenarios. We introduce Cartesian and Ego-centric prompt strategies for scenario evaluation, and an adversarial generation module that modifies trajectories of risk-inducing vehicles (ego-attackers) to create critical scenarios. We validate our approach using a 2D simulation framework and multiple pre-trained LLMs. The results show that the evaluation module effectively detects collision scenarios and infers scenario safety. Meanwhile, the new generation module identifies high-risk agents and synthesizes realistic, safety-critical scenarios. We conclude that an LLM equipped with domain-informed prompting techniques can effectively evaluate and generate safety-critical driving scenarios, reducing dependence on handcrafted metrics. We release our open-source code and scenarios at: https://github.com/TUM-AVS/From-Words-to-Collisions.