ITSC 2025 Paper Abstract

Paper FR-LA-T40.1

Mahawatta Dona, Malsha Ashani (University of Gothenburg, Sweden), Cabrero-Daniel, Beatriz (University of Gothenburg | Chalmers University of Technology), Yu, Yinan (Chalmers University of Technology), Berger, Christian (Chalmers | University of Gothenburg)

BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems

Scheduled for presentation during the Regular Session "S40c-Cooperative and Connected Autonomous Systems" (FR-LA-T40), Friday, November 21, 2025, 16:00−16:20, Cooleangata 4

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords Cooperative Driving Systems and Vehicle Coordination in Multi-vehicle Scenarios, Cyber-Physical Systems for Real-time Traffic Monitoring and Control, Cooperative Vehicle-to-Vehicle Data Sharing for Safe and Efficient Traffic Flow

Abstract

Large language models (LLMs) are growingly extended to process multimodal data such as text and video simultaneously. Their remarkable performance in understanding what is shown in images is surpassing specialized neural networks (NNs) such as Yolo that is supporting only a well-formed but very limited vocabulary, ie., objects that they are able to detect. When being non-restricted, LLMs and in particular state-of-the-art vision language models (VLMs) show impressive performance to describe even complex traffic situations. This is making them potentially suitable components for automotive perception systems to support the understanding of complex traffic situations or edge case situation. However, LLMs and VLMs are prone to hallucination, which mean to either potentially not seeing traffic agents such as vulnerable road users who are present in a situation, or to seeing traffic agents who are not there in reality. While the latter is unwanted making an ADAS or autonomous driving systems (ADS) to unnecessarily slow down, the former could lead to disastrous decisions from an ADS. In our work, we are systematically assessing the performance of 3 state-of-the-art VLMs on a diverse subset of traffic situations sampled from the Waymo Open Dataset to support safety guardrails for capturing such hallucinations in VLM-supported perception systems. We observe that both, proprietary and open VLMs exhibit remarkable image understanding capabilities even paying thorough attention to fine details sometimes difficult to spot for us humans. However, they are also still prone to making up elements in their descriptions to date requiring hallucination detection strategies such as BetterCheck that we propose in our work.