Word error rate estimation for speech recognition

E-wer

Ahmed Ali, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER eWER was 25.3% for the three hours test set, while the actual WER was 28.5%.

Original languageEnglish
Title of host publicationACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages20-24
Number of pages5
ISBN (Electronic)9781948087346
Publication statusPublished - 1 Jan 2018
Event56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia
Duration: 15 Jul 201820 Jul 2018

Publication series

NameACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Volume2

Conference

Conference56th Annual Meeting of the Association for Computational Linguistics, ACL 2018
CountryAustralia
CityMelbourne
Period15/7/1820/7/18

Fingerprint

Speech recognition
Character recognition
Transcription
Gold
Glass

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics

Cite this

Ali, A., & Renals, S. (2018). Word error rate estimation for speech recognition: E-wer. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers) (pp. 20-24). (ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers); Vol. 2). Association for Computational Linguistics (ACL).

Word error rate estimation for speech recognition : E-wer. / Ali, Ahmed; Renals, Steve.

ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers). Association for Computational Linguistics (ACL), 2018. p. 20-24 (ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers); Vol. 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ali, A & Renals, S 2018, Word error rate estimation for speech recognition: E-wer. in ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers). ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), vol. 2, Association for Computational Linguistics (ACL), pp. 20-24, 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15/7/18.
Ali A, Renals S. Word error rate estimation for speech recognition: E-wer. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers). Association for Computational Linguistics (ACL). 2018. p. 20-24. (ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)).
Ali, Ahmed ; Renals, Steve. / Word error rate estimation for speech recognition : E-wer. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers). Association for Computational Linguistics (ACL), 2018. pp. 20-24 (ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)).
@inproceedings{1fdd306089d64a5b9797f7198ba955f1,
title = "Word error rate estimation for speech recognition: E-wer",
abstract = "Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9{\%} WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER eWER was 25.3{\%} for the three hours test set, while the actual WER was 28.5{\%}.",
author = "Ahmed Ali and Steve Renals",
year = "2018",
month = "1",
day = "1",
language = "English",
series = "ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)",
publisher = "Association for Computational Linguistics (ACL)",
pages = "20--24",
booktitle = "ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers)",

}

TY - GEN

T1 - Word error rate estimation for speech recognition

T2 - E-wer

AU - Ali, Ahmed

AU - Renals, Steve

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER eWER was 25.3% for the three hours test set, while the actual WER was 28.5%.

AB - Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive. In this paper, we propose a novel approach to estimate WER, or e-WER, which does not require a gold-standard transcription of the test set. Our e-WER framework uses a comprehensive set of features: ASR recognised text, character recognition results to complement recognition output, and internal decoder features. We report results for the two features; black-box and glass-box using unseen 24 Arabic broadcast programs. Our system achieves 16.9% WER root mean squared error (RMSE) across 1,400 sentences. The estimated overall WER eWER was 25.3% for the three hours test set, while the actual WER was 28.5%.

UR - http://www.scopus.com/inward/record.url?scp=85063158851&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063158851&partnerID=8YFLogxK

M3 - Conference contribution

T3 - ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)

SP - 20

EP - 24

BT - ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers)

PB - Association for Computational Linguistics (ACL)

ER -