ITSC 2024 Paper Abstract

Paper WeAT13.3

Sinha, Shreya (University of California, Santa Cruz), Paranjape, Ishaan (University of California, Santa Cruz), Whitehead, Jim (UC Santa Cruz)

ScenarioQA: Evaluating Test Scenario Reasoning Capabilities of Large Language Models

Scheduled for presentation during the Poster Session "Large Language Models" (WeAT13), Wednesday, September 25, 2024, 10:30−12:30, Foyer

2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada

This information is tentative and subject to change. Compiled on March 6, 2025

Keywords Multi-autonomous Vehicle Studies, Models, Techniques and Simulations, Driver Assistance Systems, Advanced Vehicle Safety Systems

Abstract

Autonomous Vehicles (AVs) have the potential of reducing car accidents and increasing accessibility to transportation. AVs need to be rigorously tested. Simulation-based testing, particularly scenario based testing offers a set of approaches to design high risk tests for AVs at low cost. Since the AVs need to be tested for a large number of scenarios, automated generation approaches are needed. Pre-trained Large Language Models (LLMs) are open-input, general purpose data generators with good learning and reasoning abilities. However, due to the black box nature of these systems, it's difficult to get direct evidence of their abilities. In this paper, we address the open question of reasoning capabilities of pre-trained LLMs specifically in the context of scenario based testing of AVs. Inspired by QA benchmarks for LLM evaluations for commonsense reasoning, science reasoning and more, we present our main contribution, textit{ScenarioQA}. This benchmark involves a LLM based QA generation process based on an integration of methods to generate questions and corresponding answers specficially in the context of scenario based testing. We carry out a comprehensive evaluation of this process and gain valuable insights regarding effective QA generation. In addition, we carry out an evaluation of several available pre-trained LLMs for these abilities.