One-Class SVMs Based Pronunciation Verification Approach

Mostafa Shahin, Jim X. Ji, Beena Ahmed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The automatic assessment of speech plays an important role in Computer Aided Pronunciation Learning systems. However, modeling both the correct and incorrect pronunciation of each phoneme to achieve accurate pronunciation verification is unfeasible due to the lack of sufficient mispronounced samples in training datasets. In this paper, we propose a novel approach that handles this unbalanced data distribution by building multiple one-class SVMs to evaluate each phoneme as correct or incorrect. We model the correct pronunciation of each individual phoneme with a one-class SVM trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained DNN speech attributes classifiers. The one-class SVM model measures the similarity between the new data and the training set and then classifies it as normal (correct) or an anomaly (incorrect). We evaluated the system using native speech corpus and disordered speech corpus and compared it with the conventional Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-acceptance and false-rejection rates by around 26% and 39% respectively.

Original languageEnglish
Title of host publication2018 24th International Conference on Pattern Recognition, ICPR 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2881-2886
Number of pages6
Volume2018-August
ISBN (Electronic)9781538637883
DOIs
Publication statusPublished - 26 Nov 2018
Event24th International Conference on Pattern Recognition, ICPR 2018 - Beijing, China
Duration: 20 Aug 201824 Aug 2018

Other

Other24th International Conference on Pattern Recognition, ICPR 2018
CountryChina
CityBeijing
Period20/8/1824/8/18

Fingerprint

Learning systems
Classifiers

Keywords

  • deep learning
  • one class SVM
  • pronunciation verification
  • speech attributes

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition

Cite this

Shahin, M., Ji, J. X., & Ahmed, B. (2018). One-Class SVMs Based Pronunciation Verification Approach. In 2018 24th International Conference on Pattern Recognition, ICPR 2018 (Vol. 2018-August, pp. 2881-2886). [8545687] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR.2018.8545687

One-Class SVMs Based Pronunciation Verification Approach. / Shahin, Mostafa; Ji, Jim X.; Ahmed, Beena.

2018 24th International Conference on Pattern Recognition, ICPR 2018. Vol. 2018-August Institute of Electrical and Electronics Engineers Inc., 2018. p. 2881-2886 8545687.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shahin, M, Ji, JX & Ahmed, B 2018, One-Class SVMs Based Pronunciation Verification Approach. in 2018 24th International Conference on Pattern Recognition, ICPR 2018. vol. 2018-August, 8545687, Institute of Electrical and Electronics Engineers Inc., pp. 2881-2886, 24th International Conference on Pattern Recognition, ICPR 2018, Beijing, China, 20/8/18. https://doi.org/10.1109/ICPR.2018.8545687
Shahin M, Ji JX, Ahmed B. One-Class SVMs Based Pronunciation Verification Approach. In 2018 24th International Conference on Pattern Recognition, ICPR 2018. Vol. 2018-August. Institute of Electrical and Electronics Engineers Inc. 2018. p. 2881-2886. 8545687 https://doi.org/10.1109/ICPR.2018.8545687
Shahin, Mostafa ; Ji, Jim X. ; Ahmed, Beena. / One-Class SVMs Based Pronunciation Verification Approach. 2018 24th International Conference on Pattern Recognition, ICPR 2018. Vol. 2018-August Institute of Electrical and Electronics Engineers Inc., 2018. pp. 2881-2886
@inproceedings{53d83436b464419099eb82db285e3df9,
title = "One-Class SVMs Based Pronunciation Verification Approach",
abstract = "The automatic assessment of speech plays an important role in Computer Aided Pronunciation Learning systems. However, modeling both the correct and incorrect pronunciation of each phoneme to achieve accurate pronunciation verification is unfeasible due to the lack of sufficient mispronounced samples in training datasets. In this paper, we propose a novel approach that handles this unbalanced data distribution by building multiple one-class SVMs to evaluate each phoneme as correct or incorrect. We model the correct pronunciation of each individual phoneme with a one-class SVM trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained DNN speech attributes classifiers. The one-class SVM model measures the similarity between the new data and the training set and then classifies it as normal (correct) or an anomaly (incorrect). We evaluated the system using native speech corpus and disordered speech corpus and compared it with the conventional Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-acceptance and false-rejection rates by around 26{\%} and 39{\%} respectively.",
keywords = "deep learning, one class SVM, pronunciation verification, speech attributes",
author = "Mostafa Shahin and Ji, {Jim X.} and Beena Ahmed",
year = "2018",
month = "11",
day = "26",
doi = "10.1109/ICPR.2018.8545687",
language = "English",
volume = "2018-August",
pages = "2881--2886",
booktitle = "2018 24th International Conference on Pattern Recognition, ICPR 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - One-Class SVMs Based Pronunciation Verification Approach

AU - Shahin, Mostafa

AU - Ji, Jim X.

AU - Ahmed, Beena

PY - 2018/11/26

Y1 - 2018/11/26

N2 - The automatic assessment of speech plays an important role in Computer Aided Pronunciation Learning systems. However, modeling both the correct and incorrect pronunciation of each phoneme to achieve accurate pronunciation verification is unfeasible due to the lack of sufficient mispronounced samples in training datasets. In this paper, we propose a novel approach that handles this unbalanced data distribution by building multiple one-class SVMs to evaluate each phoneme as correct or incorrect. We model the correct pronunciation of each individual phoneme with a one-class SVM trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained DNN speech attributes classifiers. The one-class SVM model measures the similarity between the new data and the training set and then classifies it as normal (correct) or an anomaly (incorrect). We evaluated the system using native speech corpus and disordered speech corpus and compared it with the conventional Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-acceptance and false-rejection rates by around 26% and 39% respectively.

AB - The automatic assessment of speech plays an important role in Computer Aided Pronunciation Learning systems. However, modeling both the correct and incorrect pronunciation of each phoneme to achieve accurate pronunciation verification is unfeasible due to the lack of sufficient mispronounced samples in training datasets. In this paper, we propose a novel approach that handles this unbalanced data distribution by building multiple one-class SVMs to evaluate each phoneme as correct or incorrect. We model the correct pronunciation of each individual phoneme with a one-class SVM trained using a set of speech attributes features, namely the manner and place of articulation. These features are extracted from a bank of pre-trained DNN speech attributes classifiers. The one-class SVM model measures the similarity between the new data and the training set and then classifies it as normal (correct) or an anomaly (incorrect). We evaluated the system using native speech corpus and disordered speech corpus and compared it with the conventional Goodness of Pronunciation (GOP) algorithm. The results show that our approach reduces the false-acceptance and false-rejection rates by around 26% and 39% respectively.

KW - deep learning

KW - one class SVM

KW - pronunciation verification

KW - speech attributes

UR - http://www.scopus.com/inward/record.url?scp=85059734227&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85059734227&partnerID=8YFLogxK

U2 - 10.1109/ICPR.2018.8545687

DO - 10.1109/ICPR.2018.8545687

M3 - Conference contribution

AN - SCOPUS:85059734227

VL - 2018-August

SP - 2881

EP - 2886

BT - 2018 24th International Conference on Pattern Recognition, ICPR 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -