Anomaly detection based pronunciation verification approach using speech attribute features

Mostafa Shahin, Beena Ahmed

Research output: Contribution to journalArticle

Abstract

Computer aided pronunciation training tools require accurate automatic pronunciation error detection algorithms to identify errors made by their users. However, the performance of these algorithms is highly dependent on the amount of mispronounced speech data used to train them and the reliability of its manual annotation. To overcome this problem, we turned the mispronunciation detection into an anomaly detection problem, which utilize algorithms trained with only correctly pronounced speech data. In this work we adopted the One-Class SVM as our anomaly detection model, with a specific model built for each phoneme. Each model was fed with a set of speech attribute features, namely the manners and places of articulation, extracted from a bank of binary DNN speech attribute detectors. We also applied multi-task learning and dropout approaches to alleviate the overfitting problem in the DNN speech attribute detectors. We trained the system using the WSJ0 and TIMIT standard data sets which contain only native English speech data and then evaluated it using three different data sets, a native English speaker corpus with artificial errors, a foreign-accented speech corpus and a children's disordered speech corpus. Finally, we compared our system with the conventional Goodness-of-Pronunciation (GOP) algorithm to demonstrate the effectiveness of our method. The results show that our method reduced the false-acceptance and false-rejection rates by 26% and 39% respectively compared to the GOP method.

Original languageEnglish
Pages (from-to)29-43
Number of pages15
JournalSpeech Communication
Volume111
DOIs
Publication statusPublished - 1 Aug 2019

Fingerprint

Anomaly Detection
Attribute
Detector
Multi-task Learning
Detectors
Speech
Anomaly
Error Detection
Overfitting
Drop out
drop-out
Rejection
Error detection
Annotation
bank
acceptance
Model
Binary
Dependent
Demonstrate

Keywords

  • Anomaly detection
  • One class SVM
  • Pronunciation verification
  • Speech attributes

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Anomaly detection based pronunciation verification approach using speech attribute features. / Shahin, Mostafa; Ahmed, Beena.

In: Speech Communication, Vol. 111, 01.08.2019, p. 29-43.

Research output: Contribution to journalArticle

@article{e436843579b34c0a956bde349b26e3be,
title = "Anomaly detection based pronunciation verification approach using speech attribute features",
abstract = "Computer aided pronunciation training tools require accurate automatic pronunciation error detection algorithms to identify errors made by their users. However, the performance of these algorithms is highly dependent on the amount of mispronounced speech data used to train them and the reliability of its manual annotation. To overcome this problem, we turned the mispronunciation detection into an anomaly detection problem, which utilize algorithms trained with only correctly pronounced speech data. In this work we adopted the One-Class SVM as our anomaly detection model, with a specific model built for each phoneme. Each model was fed with a set of speech attribute features, namely the manners and places of articulation, extracted from a bank of binary DNN speech attribute detectors. We also applied multi-task learning and dropout approaches to alleviate the overfitting problem in the DNN speech attribute detectors. We trained the system using the WSJ0 and TIMIT standard data sets which contain only native English speech data and then evaluated it using three different data sets, a native English speaker corpus with artificial errors, a foreign-accented speech corpus and a children's disordered speech corpus. Finally, we compared our system with the conventional Goodness-of-Pronunciation (GOP) algorithm to demonstrate the effectiveness of our method. The results show that our method reduced the false-acceptance and false-rejection rates by 26{\%} and 39{\%} respectively compared to the GOP method.",
keywords = "Anomaly detection, One class SVM, Pronunciation verification, Speech attributes",
author = "Mostafa Shahin and Beena Ahmed",
year = "2019",
month = "8",
day = "1",
doi = "10.1016/j.specom.2019.06.003",
language = "English",
volume = "111",
pages = "29--43",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",

}

TY - JOUR

T1 - Anomaly detection based pronunciation verification approach using speech attribute features

AU - Shahin, Mostafa

AU - Ahmed, Beena

PY - 2019/8/1

Y1 - 2019/8/1

N2 - Computer aided pronunciation training tools require accurate automatic pronunciation error detection algorithms to identify errors made by their users. However, the performance of these algorithms is highly dependent on the amount of mispronounced speech data used to train them and the reliability of its manual annotation. To overcome this problem, we turned the mispronunciation detection into an anomaly detection problem, which utilize algorithms trained with only correctly pronounced speech data. In this work we adopted the One-Class SVM as our anomaly detection model, with a specific model built for each phoneme. Each model was fed with a set of speech attribute features, namely the manners and places of articulation, extracted from a bank of binary DNN speech attribute detectors. We also applied multi-task learning and dropout approaches to alleviate the overfitting problem in the DNN speech attribute detectors. We trained the system using the WSJ0 and TIMIT standard data sets which contain only native English speech data and then evaluated it using three different data sets, a native English speaker corpus with artificial errors, a foreign-accented speech corpus and a children's disordered speech corpus. Finally, we compared our system with the conventional Goodness-of-Pronunciation (GOP) algorithm to demonstrate the effectiveness of our method. The results show that our method reduced the false-acceptance and false-rejection rates by 26% and 39% respectively compared to the GOP method.

AB - Computer aided pronunciation training tools require accurate automatic pronunciation error detection algorithms to identify errors made by their users. However, the performance of these algorithms is highly dependent on the amount of mispronounced speech data used to train them and the reliability of its manual annotation. To overcome this problem, we turned the mispronunciation detection into an anomaly detection problem, which utilize algorithms trained with only correctly pronounced speech data. In this work we adopted the One-Class SVM as our anomaly detection model, with a specific model built for each phoneme. Each model was fed with a set of speech attribute features, namely the manners and places of articulation, extracted from a bank of binary DNN speech attribute detectors. We also applied multi-task learning and dropout approaches to alleviate the overfitting problem in the DNN speech attribute detectors. We trained the system using the WSJ0 and TIMIT standard data sets which contain only native English speech data and then evaluated it using three different data sets, a native English speaker corpus with artificial errors, a foreign-accented speech corpus and a children's disordered speech corpus. Finally, we compared our system with the conventional Goodness-of-Pronunciation (GOP) algorithm to demonstrate the effectiveness of our method. The results show that our method reduced the false-acceptance and false-rejection rates by 26% and 39% respectively compared to the GOP method.

KW - Anomaly detection

KW - One class SVM

KW - Pronunciation verification

KW - Speech attributes

UR - http://www.scopus.com/inward/record.url?scp=85067285318&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067285318&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2019.06.003

DO - 10.1016/j.specom.2019.06.003

M3 - Article

VL - 111

SP - 29

EP - 43

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

ER -