Towards variability resistant dialectal speech evaluation

Ahmed Ali, Salam Khalifa, Nizar Habash

Research output: Contribution to journalConference article

Abstract

We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Specifically targeting the case of Arabic dialects, which are also morphologically rich and complex, we propose a number of alternative WER-based metrics that vary in terms of text representation, including different degrees of morphological abstraction and spelling normalization. We evaluate the efficacy of these metrics by comparing their correlation with human judgments on a validation set of 1,000 utterances. Our results show that the use of morphological abstractions and spelling normalization produces systems with higher correlation with human judgment. We released the code and the datasets to the research community.

Original languageEnglish
Pages (from-to)336-340
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2019-September
DOIs
Publication statusPublished - 1 Jan 2019
Event20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Fingerprint

Speech recognition
Automatic Speech Recognition
Metric
Normalization
Error Rate
Evaluation
Output
Gold
Efficacy
Vary
Target
Evaluate
Alternatives
Speech
Human
Judgment
Standards
Abstraction
Spelling

Keywords

  • ASR
  • Dialects
  • Evaluation
  • Metrics
  • Non-standard Orthography

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Towards variability resistant dialectal speech evaluation. / Ali, Ahmed; Khalifa, Salam; Habash, Nizar.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2019-September, 01.01.2019, p. 336-340.

Research output: Contribution to journalConference article

@article{8ec71be07f634bb384487c505b76c35c,
title = "Towards variability resistant dialectal speech evaluation",
abstract = "We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Specifically targeting the case of Arabic dialects, which are also morphologically rich and complex, we propose a number of alternative WER-based metrics that vary in terms of text representation, including different degrees of morphological abstraction and spelling normalization. We evaluate the efficacy of these metrics by comparing their correlation with human judgments on a validation set of 1,000 utterances. Our results show that the use of morphological abstractions and spelling normalization produces systems with higher correlation with human judgment. We released the code and the datasets to the research community.",
keywords = "ASR, Dialects, Evaluation, Metrics, Non-standard Orthography",
author = "Ahmed Ali and Salam Khalifa and Nizar Habash",
year = "2019",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2019-2692",
language = "English",
volume = "2019-September",
pages = "336--340",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Towards variability resistant dialectal speech evaluation

AU - Ali, Ahmed

AU - Khalifa, Salam

AU - Habash, Nizar

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Specifically targeting the case of Arabic dialects, which are also morphologically rich and complex, we propose a number of alternative WER-based metrics that vary in terms of text representation, including different degrees of morphological abstraction and spelling normalization. We evaluate the efficacy of these metrics by comparing their correlation with human judgments on a validation set of 1,000 utterances. Our results show that the use of morphological abstractions and spelling normalization produces systems with higher correlation with human judgment. We released the code and the datasets to the research community.

AB - We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Specifically targeting the case of Arabic dialects, which are also morphologically rich and complex, we propose a number of alternative WER-based metrics that vary in terms of text representation, including different degrees of morphological abstraction and spelling normalization. We evaluate the efficacy of these metrics by comparing their correlation with human judgments on a validation set of 1,000 utterances. Our results show that the use of morphological abstractions and spelling normalization produces systems with higher correlation with human judgment. We released the code and the datasets to the research community.

KW - ASR

KW - Dialects

KW - Evaluation

KW - Metrics

KW - Non-standard Orthography

UR - http://www.scopus.com/inward/record.url?scp=85074681522&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074681522&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2019-2692

DO - 10.21437/Interspeech.2019-2692

M3 - Conference article

AN - SCOPUS:85074681522

VL - 2019-September

SP - 336

EP - 340

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -