QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition

MGB-2 challenge

Sameer Khurana, Ahmed Ali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

In this paper, we describe Qatar Computing Research Institute's (QCRI) speech transcription system for the 2016 Dialectal Arabic Multi-Genre Broadcast (MGB-2) challenge. MGB-2 is a controlled evaluation using 1,200 hours audio with lightly supervised transcription Our system which was a combination of three purely sequence trained recognition systems, achieved the lowest WER of 14.2% among the nine participating teams. Key features of our transcription system are: purely sequence trained acoustic models using the recently introduced Lattice free Maximum Mutual Information (LF-MMI) modeling framework; Language model rescoring using a four-gram and Recurrent Neural Network with Max-Ent connections (RNNME) language models; and system combination using Minimum Bayes Risk (MBR) decoding criterion. The whole system is built using kaldi speech recognition toolkit.

Original languageEnglish
Title of host publication2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages292-298
Number of pages7
ISBN (Electronic)9781509049035
DOIs
Publication statusPublished - 7 Feb 2017
Event2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - San Diego, United States
Duration: 13 Dec 201616 Dec 2016

Other

Other2016 IEEE Workshop on Spoken Language Technology, SLT 2016
CountryUnited States
CitySan Diego
Period13/12/1616/12/16

Fingerprint

Transcription
Recurrent neural networks
Speech recognition
Decoding
Acoustics

Keywords

  • Arabic Speech Recognition
  • Bi-directional LSTM
  • Kaldi
  • Purely sequence trained acoustic models
  • QATS
  • RNN LM

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Artificial Intelligence
  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Khurana, S., & Ali, A. (2017). QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge. In 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings (pp. 292-298). [7846279] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT.2016.7846279

QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition : MGB-2 challenge. / Khurana, Sameer; Ali, Ahmed.

2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 292-298 7846279.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Khurana, S & Ali, A 2017, QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge. in 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings., 7846279, Institute of Electrical and Electronics Engineers Inc., pp. 292-298, 2016 IEEE Workshop on Spoken Language Technology, SLT 2016, San Diego, United States, 13/12/16. https://doi.org/10.1109/SLT.2016.7846279
Khurana S, Ali A. QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge. In 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 292-298. 7846279 https://doi.org/10.1109/SLT.2016.7846279
Khurana, Sameer ; Ali, Ahmed. / QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition : MGB-2 challenge. 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 292-298
@inproceedings{d3ce1977c78b4ee7b091ca09184437fd,
title = "QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge",
abstract = "In this paper, we describe Qatar Computing Research Institute's (QCRI) speech transcription system for the 2016 Dialectal Arabic Multi-Genre Broadcast (MGB-2) challenge. MGB-2 is a controlled evaluation using 1,200 hours audio with lightly supervised transcription Our system which was a combination of three purely sequence trained recognition systems, achieved the lowest WER of 14.2{\%} among the nine participating teams. Key features of our transcription system are: purely sequence trained acoustic models using the recently introduced Lattice free Maximum Mutual Information (LF-MMI) modeling framework; Language model rescoring using a four-gram and Recurrent Neural Network with Max-Ent connections (RNNME) language models; and system combination using Minimum Bayes Risk (MBR) decoding criterion. The whole system is built using kaldi speech recognition toolkit.",
keywords = "Arabic Speech Recognition, Bi-directional LSTM, Kaldi, Purely sequence trained acoustic models, QATS, RNN LM",
author = "Sameer Khurana and Ahmed Ali",
year = "2017",
month = "2",
day = "7",
doi = "10.1109/SLT.2016.7846279",
language = "English",
pages = "292--298",
booktitle = "2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition

T2 - MGB-2 challenge

AU - Khurana, Sameer

AU - Ali, Ahmed

PY - 2017/2/7

Y1 - 2017/2/7

N2 - In this paper, we describe Qatar Computing Research Institute's (QCRI) speech transcription system for the 2016 Dialectal Arabic Multi-Genre Broadcast (MGB-2) challenge. MGB-2 is a controlled evaluation using 1,200 hours audio with lightly supervised transcription Our system which was a combination of three purely sequence trained recognition systems, achieved the lowest WER of 14.2% among the nine participating teams. Key features of our transcription system are: purely sequence trained acoustic models using the recently introduced Lattice free Maximum Mutual Information (LF-MMI) modeling framework; Language model rescoring using a four-gram and Recurrent Neural Network with Max-Ent connections (RNNME) language models; and system combination using Minimum Bayes Risk (MBR) decoding criterion. The whole system is built using kaldi speech recognition toolkit.

AB - In this paper, we describe Qatar Computing Research Institute's (QCRI) speech transcription system for the 2016 Dialectal Arabic Multi-Genre Broadcast (MGB-2) challenge. MGB-2 is a controlled evaluation using 1,200 hours audio with lightly supervised transcription Our system which was a combination of three purely sequence trained recognition systems, achieved the lowest WER of 14.2% among the nine participating teams. Key features of our transcription system are: purely sequence trained acoustic models using the recently introduced Lattice free Maximum Mutual Information (LF-MMI) modeling framework; Language model rescoring using a four-gram and Recurrent Neural Network with Max-Ent connections (RNNME) language models; and system combination using Minimum Bayes Risk (MBR) decoding criterion. The whole system is built using kaldi speech recognition toolkit.

KW - Arabic Speech Recognition

KW - Bi-directional LSTM

KW - Kaldi

KW - Purely sequence trained acoustic models

KW - QATS

KW - RNN LM

UR - http://www.scopus.com/inward/record.url?scp=85016038956&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016038956&partnerID=8YFLogxK

U2 - 10.1109/SLT.2016.7846279

DO - 10.1109/SLT.2016.7846279

M3 - Conference contribution

SP - 292

EP - 298

BT - 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -