Anomaly detection approach for pronunciation verification of disordered speech using speech attribute features

Mostafa Shahin, Beena Ahmed, Jim X. Ji, Kirrie Ballard

Research output: Contribution to journalConference article

Abstract

The automatic assessment of speech is a powerful tool in computer aided speech therapy for disorders such as Childhood Apraxia of Speech (CAS). However, the lack of sufficient annotated disordered speech data seriously impedes the accurate detection of pronunciation errors. To handle this deficiency, in this paper, we used the novel approach of tackling pronunciation verification as an anomaly detection problem. We achieved this by modeling only the correct pronunciation of each individual phoneme with a one-class Support Vector Machine (SVM) trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained Deep Neural Network (DNN) speech attributes classifiers. The one-class SVM model classifies each phoneme production as normal (correct) or an anomaly (incorrect). We evaluated the system using both native speech with artificial errors and disordered speech collected from children with apraxia of speech and compared it with the DNN Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-rejection rates by around 35% when applied to disordered speech.

Original languageEnglish
Pages (from-to)1671-1675
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
Publication statusPublished - 1 Jan 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: 2 Sep 20186 Sep 2018

Fingerprint

Anomaly Detection
Attribute
Support vector machines
Support Vector Machine
Speech
Anomaly
Neural Networks
Rejection
Therapy
Disorder
Classifiers
Classify
Classifier
Sufficient

Keywords

  • Deep learning
  • Disordered speech
  • One class SVM
  • Pronunciation verification
  • Speech attributes

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Anomaly detection approach for pronunciation verification of disordered speech using speech attribute features. / Shahin, Mostafa; Ahmed, Beena; Ji, Jim X.; Ballard, Kirrie.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-September, 01.01.2018, p. 1671-1675.

Research output: Contribution to journalConference article

@article{68618d6221eb430490b3f635c08838fd,
title = "Anomaly detection approach for pronunciation verification of disordered speech using speech attribute features",
abstract = "The automatic assessment of speech is a powerful tool in computer aided speech therapy for disorders such as Childhood Apraxia of Speech (CAS). However, the lack of sufficient annotated disordered speech data seriously impedes the accurate detection of pronunciation errors. To handle this deficiency, in this paper, we used the novel approach of tackling pronunciation verification as an anomaly detection problem. We achieved this by modeling only the correct pronunciation of each individual phoneme with a one-class Support Vector Machine (SVM) trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained Deep Neural Network (DNN) speech attributes classifiers. The one-class SVM model classifies each phoneme production as normal (correct) or an anomaly (incorrect). We evaluated the system using both native speech with artificial errors and disordered speech collected from children with apraxia of speech and compared it with the DNN Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-rejection rates by around 35{\%} when applied to disordered speech.",
keywords = "Deep learning, Disordered speech, One class SVM, Pronunciation verification, Speech attributes",
author = "Mostafa Shahin and Beena Ahmed and Ji, {Jim X.} and Kirrie Ballard",
year = "2018",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2018-1319",
language = "English",
volume = "2018-September",
pages = "1671--1675",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Anomaly detection approach for pronunciation verification of disordered speech using speech attribute features

AU - Shahin, Mostafa

AU - Ahmed, Beena

AU - Ji, Jim X.

AU - Ballard, Kirrie

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The automatic assessment of speech is a powerful tool in computer aided speech therapy for disorders such as Childhood Apraxia of Speech (CAS). However, the lack of sufficient annotated disordered speech data seriously impedes the accurate detection of pronunciation errors. To handle this deficiency, in this paper, we used the novel approach of tackling pronunciation verification as an anomaly detection problem. We achieved this by modeling only the correct pronunciation of each individual phoneme with a one-class Support Vector Machine (SVM) trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained Deep Neural Network (DNN) speech attributes classifiers. The one-class SVM model classifies each phoneme production as normal (correct) or an anomaly (incorrect). We evaluated the system using both native speech with artificial errors and disordered speech collected from children with apraxia of speech and compared it with the DNN Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-rejection rates by around 35% when applied to disordered speech.

AB - The automatic assessment of speech is a powerful tool in computer aided speech therapy for disorders such as Childhood Apraxia of Speech (CAS). However, the lack of sufficient annotated disordered speech data seriously impedes the accurate detection of pronunciation errors. To handle this deficiency, in this paper, we used the novel approach of tackling pronunciation verification as an anomaly detection problem. We achieved this by modeling only the correct pronunciation of each individual phoneme with a one-class Support Vector Machine (SVM) trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained Deep Neural Network (DNN) speech attributes classifiers. The one-class SVM model classifies each phoneme production as normal (correct) or an anomaly (incorrect). We evaluated the system using both native speech with artificial errors and disordered speech collected from children with apraxia of speech and compared it with the DNN Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-rejection rates by around 35% when applied to disordered speech.

KW - Deep learning

KW - Disordered speech

KW - One class SVM

KW - Pronunciation verification

KW - Speech attributes

UR - http://www.scopus.com/inward/record.url?scp=85054999963&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054999963&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1319

DO - 10.21437/Interspeech.2018-1319

M3 - Conference article

AN - SCOPUS:85054999963

VL - 2018-September

SP - 1671

EP - 1675

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -