ITSC 2025 Paper Abstract

Paper FR-LA-T38.1

Yang, Zichong (Purdue University), Panchal, Jitesh (Purdue University), Wang, Ziran (Purdue University)

A Multi-Fidelity Risk-Based Testing Framework for Evolving AI Systems: An Autonomous Driving Study

Scheduled for presentation during the Regular Session "S38c-Towards Scalable and Trustworthy AI in Connected Mobility" (FR-LA-T38), Friday, November 21, 2025, 16:00−16:20, Coolangata 2

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords Evaluation of Autonomous Vehicle Performance in Mixed Traffic Environments, Autonomous Vehicle Safety and Performance Testing, Safety Verification and Validation Methods for Autonomous Vehicle Technologies

Abstract

Testing and evaluation of artificial intelligence (AI) systems for autonomous driving presents significant challenges due to their evolving nature and the complexity of operational environments. Traditional approaches using fixed datasets or limited real-world testing often fail to provide comprehensive assessments of system reliability across diverse scenarios. This paper presents a novel multi-fidelity risk-based testing framework specifically designed for evolving AI systems, which includes three key innovations: 1) a systematic quantification of failure modes and their associated risks derived from requirements, 2) a sequential design of experiments that efficiently explores the simulation model space while minimizing resource utilization, and 3) knowledge transfer mechanisms that leverage prior testing results when evaluating updated AI versions. We demonstrate our approach through a case study of autonomous vehicle perception systems, using four progressive versions of YOLO object detection systems in a simulated environment with varying fidelity levels. Results show that our sequential testing approach achieves required confidence levels with significantly fewer iterations than conventional methods, while the risk balance metrics reveal clear improvements across successive AI generations. The framework enables acquisition organizations to comprehensively evaluate AI components with limited access to development details, bridging the gap between laboratory performance and operational reliability in safety-critical autonomous systems.