ITSC 2025 Paper Abstract

Paper FR-LM-T32.2

Yoo, Hailey Hyosun (University of Melbourne), Sarvi, Majid (University of Melbourne), Bagloee, Saeed (Melbourne Uni)

High-Accuracy Audio-Based Vehicle Detection: Deep Learning vs Machine Learning

Scheduled for presentation during the Regular Session "S32a-AI-Driven Traffic Monitoring, Safety, and Anomaly Detection" (FR-LM-T32), Friday, November 21, 2025, 10:50−11:10, Southport 2

2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC), November 18-21, 2025, Gold Coast, Australia

This information is tentative and subject to change. Compiled on October 18, 2025

Keywords AI, Machine Learning for Real-time Traffic Flow Prediction and Management, AI, Machine Learning Techniques for Traffic Demand Forecasting

Abstract

Audio-based vehicle detection offers a cost-effective and scalable alternative to loop detectors or CCTV-based systems for traffic monitoring. This study compares the effectiveness of audio-based traffic monitoring in diverse acoustic conditions using Machine Learning (ML) and Deep Learning (DL) models. The experiment utilized 41 audio features derived from various transformations such as Fast Fourier Transform, Discrete Wavelet Transform, and seven types of spectrograms and raw waveforms. Sixteen ML/DL methods were examined, including Random Forest (RF), XGBoost, and Extra Trees. Experimental results on three benchmark datasets demonstrate the robustness of ML methods, with RF achieving up to 99.8% accuracy on the IDMT dataset, followed by XGBoost (97.5%) and Extra Trees (95.1%). In comparison, CNN models using spectrograms achieved 78.7%–94.7% accuracy. These findings suggest that ML models can outperform DL models in simpler tasks such as vehicle counting, providing significant time and resource savings. ML performs well with clear and consistent audio cues and smaller datasets due to its interpretability. In contrast, DL models excel in more complex tasks by learning directly from raw waveforms, capturing subtle patterns that extracted features may overlook. However, they typically require large amounts of data.