WERD

Using social text spelling variants for evaluating dialectal speech recognition

Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Such a situation is typical for machine translation (MT), and thus we borrow ideas from an MT evaluation metric, namely TERp, an extension of translation error rate which is closely-related to WER. In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion. Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for evaluating dialectal ASR.

Original languageEnglish
Title of host publication2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages141-148
Number of pages8
Volume2018-January
ISBN (Electronic)9781509047888
DOIs
Publication statusPublished - 24 Jan 2018
Event2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017

Other

Other2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
CountryJapan
CityOkinawa
Period16/12/1720/12/17

Fingerprint

Speech recognition
Experiments

Keywords

  • ASR evaluation
  • Automatic speech recognition
  • dialectal ASR
  • multi-reference WER
  • word error rate

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction

Cite this

Ali, A., Nakov, P., Bell, P., & Renals, S. (2018). WERD: Using social text spelling variants for evaluating dialectal speech recognition. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings (Vol. 2018-January, pp. 141-148). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2017.8268928

WERD : Using social text spelling variants for evaluating dialectal speech recognition. / Ali, Ahmed; Nakov, Preslav; Bell, Peter; Renals, Steve.

2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. p. 141-148.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ali, A, Nakov, P, Bell, P & Renals, S 2018, WERD: Using social text spelling variants for evaluating dialectal speech recognition. in 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 141-148, 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, 16/12/17. https://doi.org/10.1109/ASRU.2017.8268928
Ali A, Nakov P, Bell P, Renals S. WERD: Using social text spelling variants for evaluating dialectal speech recognition. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. Vol. 2018-January. Institute of Electrical and Electronics Engineers Inc. 2018. p. 141-148 https://doi.org/10.1109/ASRU.2017.8268928
Ali, Ahmed ; Nakov, Preslav ; Bell, Peter ; Renals, Steve. / WERD : Using social text spelling variants for evaluating dialectal speech recognition. 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. pp. 141-148
@inproceedings{ce55e04428c94baab4d1d35cdf471011,
title = "WERD: Using social text spelling variants for evaluating dialectal speech recognition",
abstract = "We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Such a situation is typical for machine translation (MT), and thus we borrow ideas from an MT evaluation metric, namely TERp, an extension of translation error rate which is closely-related to WER. In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion. Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for evaluating dialectal ASR.",
keywords = "ASR evaluation, Automatic speech recognition, dialectal ASR, multi-reference WER, word error rate",
author = "Ahmed Ali and Preslav Nakov and Peter Bell and Steve Renals",
year = "2018",
month = "1",
day = "24",
doi = "10.1109/ASRU.2017.8268928",
language = "English",
volume = "2018-January",
pages = "141--148",
booktitle = "2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - WERD

T2 - Using social text spelling variants for evaluating dialectal speech recognition

AU - Ali, Ahmed

AU - Nakov, Preslav

AU - Bell, Peter

AU - Renals, Steve

PY - 2018/1/24

Y1 - 2018/1/24

N2 - We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Such a situation is typical for machine translation (MT), and thus we borrow ideas from an MT evaluation metric, namely TERp, an extension of translation error rate which is closely-related to WER. In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion. Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for evaluating dialectal ASR.

AB - We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Such a situation is typical for machine translation (MT), and thus we borrow ideas from an MT evaluation metric, namely TERp, an extension of translation error rate which is closely-related to WER. In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion. Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for evaluating dialectal ASR.

KW - ASR evaluation

KW - Automatic speech recognition

KW - dialectal ASR

KW - multi-reference WER

KW - word error rate

UR - http://www.scopus.com/inward/record.url?scp=85050551927&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050551927&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2017.8268928

DO - 10.1109/ASRU.2017.8268928

M3 - Conference contribution

VL - 2018-January

SP - 141

EP - 148

BT - 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -