Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera

Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James Glass, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86%, and a broadcast conversations with a best WER of 29.85%. The overall WER on this test set is 25.6%.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech and Communication Association
Pages2088-2092
Number of pages5
Publication statusPublished - 2014
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 14 Sep 201418 Sep 2014

Other

Other15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
CountrySingapore
CitySingapore
Period14/9/1418/9/14

Fingerprint

Transcription
Broadcast
Error Rate
Speaker Adaptation
Test Set
Speech Recognition
Speech recognition
Neural Networks
Scenarios
Training

Keywords

  • Arabic
  • ASR system
  • Kaldi

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Cardinal, P., Ali, A., Dehak, N., Zhang, Y., Al Hanai, T., Zhang, Y., ... Vogel, S. (2014). Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 2088-2092). International Speech and Communication Association.

Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera. / Cardinal, Patrick; Ali, Ahmed; Dehak, Najim; Zhang, Yu; Al Hanai, Tuka; Zhang, Yifan; Glass, James; Vogel, Stephan.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2014. p. 2088-2092.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cardinal, P, Ali, A, Dehak, N, Zhang, Y, Al Hanai, T, Zhang, Y, Glass, J & Vogel, S 2014, Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, pp. 2088-2092, 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, Singapore, Singapore, 14/9/14.
Cardinal P, Ali A, Dehak N, Zhang Y, Al Hanai T, Zhang Y et al. Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association. 2014. p. 2088-2092
Cardinal, Patrick ; Ali, Ahmed ; Dehak, Najim ; Zhang, Yu ; Al Hanai, Tuka ; Zhang, Yifan ; Glass, James ; Vogel, Stephan. / Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2014. pp. 2088-2092
@inproceedings{981c2335231b42088baa10fd60b63995,
title = "Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera",
abstract = "This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86{\%}, and a broadcast conversations with a best WER of 29.85{\%}. The overall WER on this test set is 25.6{\%}.",
keywords = "Arabic, ASR system, Kaldi",
author = "Patrick Cardinal and Ahmed Ali and Najim Dehak and Yu Zhang and {Al Hanai}, Tuka and Yifan Zhang and James Glass and Stephan Vogel",
year = "2014",
language = "English",
pages = "2088--2092",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
publisher = "International Speech and Communication Association",

}

TY - GEN

T1 - Recent advances in ASR Applied to an Arabic transcription system for Al-Jazeera

AU - Cardinal, Patrick

AU - Ali, Ahmed

AU - Dehak, Najim

AU - Zhang, Yu

AU - Al Hanai, Tuka

AU - Zhang, Yifan

AU - Glass, James

AU - Vogel, Stephan

PY - 2014

Y1 - 2014

N2 - This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86%, and a broadcast conversations with a best WER of 29.85%. The overall WER on this test set is 25.6%.

AB - This paper describes a detailed comparison of several state-of-the-art speech recognition techniques applied to a limited Arabic broadcast news dataset. The different approaches were all trained on 50 hours of transcribed audio from the Al-Jazeera news channel. The best results were obtained using i-vector-based speaker adaptation in a training scenario using the Minimum Phone Error (MPE) criteria combined with sequential Deep Neural Network (DNN) training. We report results for two different types of test data: broadcast news reports, with a best word error rate (WER) of 17.86%, and a broadcast conversations with a best WER of 29.85%. The overall WER on this test set is 25.6%.

KW - Arabic

KW - ASR system

KW - Kaldi

UR - http://www.scopus.com/inward/record.url?scp=84910093677&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910093677&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84910093677

SP - 2088

EP - 2092

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

PB - International Speech and Communication Association

ER -