Paper WeAT17.8
Li, Zechen (Chongqing university), Tu, Huan (Chongqing University), Wang, Min (Chongqing University), Huang, Yuhui (Northwest Institute of Nuclear Technology), liang, Shan (Chongqing University)
A Whisper-Based Dialect Speech Recognition Model for VHF Calls in Waterway Traffic
Scheduled for presentation during the Poster Session "Detection, estimatation and prediction for intelligent transportation systems" (WeAT17), Wednesday, September 25, 2024,
10:30−12:30, Foyer
2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC), September 24- 27, 2024, Edmonton, Canada
This information is tentative and subject to change. Compiled on October 14, 2024
|
|
Keywords Ports, Waterways, Inland Navigation, and Vessel Traffic Management, Data Mining and Data Analysis, Other Theories, Applications, and Technologies
Abstract
Very High Frequency (VHF) is the most widely used means of real-time voice communication and plays an extremely important role in the field of water transportation. Existing Automatic Speech Recognition (ASR) systems often prioritize research on achieving higher recognition accuracy. However, the speech captured by ship-borne VHF equipment is often accompanied by horn blasts, vessel engine noise, and other types of noise, leading to lack significant effectiveness. Furthermore, with the development of the shipping industry, cross-regional and cross-national voyages have become a trend. Single-language ASR models are no longer sufficient to meet the necessary communication demands for waterway traffic. After gathering two months' worth of actual traffic command voice data from the upper Yangtze River, we created a speech annotation software and established a golden dataset. Then, we designed a pre-processing method with adjustable aggressive mode which is specifically to address the characteristics of VHF signals. The final step involves fine-tuning a large-scale pre-trained ASR model, utilizing it as a converter for information exchange in maritime communication, facilitating cross-lingual voice interactions, and simultaneously validating the effectiveness of the approach outlined in this paper. Additionally, the experimental results demonstrate that the method could reduce the hardware resource consumption of the model.
|
|