Automatic speech recognition of Arabic multi-genre broadcast media

Maryam Najafian, Wei Ning Hsu, Ahmed Ali, James Glass

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper describes an Arabic Automatic Speech Recognition system developed on 15 hours of Multi-Genre Broadcast (MGB-3) data from YouTube, plus 1,200 hours of Multi-Dialect and Multi-Genre MGB-2 data recorded from the Aljazeera Arabic TV channel. In this paper, we report our investigations of a range of signal pre-processing, data augmentation, topic-specific language model adaptation, accent specific re-training, and deep learning based acoustic modeling topologies, such as feed-forward Deep Neural Networks (DNNs), Time-delay Neural Networks (TDNNs), Long Short-term Memory (LSTM) networks, Bidirectional LSTMs (BLSTMs), and a Bidirectional version of the Prioritized Grid LSTM (BPGLSTM) model. We propose a system combination for three purely sequence trained recognition systems based on lattice-free maximum mutual information, 4-gram language model re-scoring, and system combination using the minimum Bayes risk decoding criterion. The best word error rate we obtained on the MGB-3 Arabic development set using a 4-gram re-scoring strategy is 42.25% for a chain BLSTM system, compared to 65.44% baseline for a DNN system.

Original languageEnglish
Title of host publication2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages353-359
Number of pages7
Volume2018-January
ISBN (Electronic)9781509047888
DOIs
Publication statusPublished - 24 Jan 2018
Event2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017

Other

Other2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
CountryJapan
CityOkinawa
Period16/12/1720/12/17

    Fingerprint

Keywords

  • Acoustic mis-match
  • multi-dialect
  • multi-genre
  • RNNs
  • Speech recognition

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction

Cite this

Najafian, M., Hsu, W. N., Ali, A., & Glass, J. (2018). Automatic speech recognition of Arabic multi-genre broadcast media. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings (Vol. 2018-January, pp. 353-359). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2017.8268957