Classification of bisyllabic lexical stress patterns in disordered speech using deep learning

Mostafa Shahin, Ricardo Gutierrez-Osuna, Beena Ahmed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Technology-based therapy tools can be of great benefit to children with developmental speech disabilities as they typically require sustained practice with a speech therapist for several years. Towards this aim, over the past 4 years we have developed speech processing tools to automatically detect common errors in disordered speech. This paper presents an automated technique to identify incorrect lexical stress. Specifically, we describe a deep neural network (DNN) that can be used to classify the four different bisyllabic stress patterns: strong-weak (SW), weak-strong (WS), strong-strong (SS) and weak-weak (WW). We derive input features for the DNN from the duration, pitch, intensity and spectral energy on each of the two consecutive syllables. Using these features, we achieve 93% correct classification between SW/WS stress patterns and 88% correct classification of the four bisyllabic patterns on speech from typically developing children, while we obtain 73.4% classification between SW/WS in disordered speech. These figures represent a two-fold reduction in error rates compared to our prior work, which used a DNN with differential features from consecutive syllables.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6480-6484
Number of pages5
Volume2016-May
ISBN (Electronic)9781479999880
DOIs
Publication statusPublished - 18 May 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 20 Mar 201625 Mar 2016

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period20/3/1625/3/16

Fingerprint

Speech processing
Deep learning
Deep neural networks

Keywords

  • automated speech therapy
  • deep neural network
  • lexical stress
  • prosody

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Shahin, M., Gutierrez-Osuna, R., & Ahmed, B. (2016). Classification of bisyllabic lexical stress patterns in disordered speech using deep learning. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (Vol. 2016-May, pp. 6480-6484). [7472925] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7472925

Classification of bisyllabic lexical stress patterns in disordered speech using deep learning. / Shahin, Mostafa; Gutierrez-Osuna, Ricardo; Ahmed, Beena.

2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. p. 6480-6484 7472925.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shahin, M, Gutierrez-Osuna, R & Ahmed, B 2016, Classification of bisyllabic lexical stress patterns in disordered speech using deep learning. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. vol. 2016-May, 7472925, Institute of Electrical and Electronics Engineers Inc., pp. 6480-6484, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 20/3/16. https://doi.org/10.1109/ICASSP.2016.7472925
Shahin M, Gutierrez-Osuna R, Ahmed B. Classification of bisyllabic lexical stress patterns in disordered speech using deep learning. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May. Institute of Electrical and Electronics Engineers Inc. 2016. p. 6480-6484. 7472925 https://doi.org/10.1109/ICASSP.2016.7472925
Shahin, Mostafa ; Gutierrez-Osuna, Ricardo ; Ahmed, Beena. / Classification of bisyllabic lexical stress patterns in disordered speech using deep learning. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. pp. 6480-6484
@inproceedings{b8891570858f48089f1bdfb5a641bc78,
title = "Classification of bisyllabic lexical stress patterns in disordered speech using deep learning",
abstract = "Technology-based therapy tools can be of great benefit to children with developmental speech disabilities as they typically require sustained practice with a speech therapist for several years. Towards this aim, over the past 4 years we have developed speech processing tools to automatically detect common errors in disordered speech. This paper presents an automated technique to identify incorrect lexical stress. Specifically, we describe a deep neural network (DNN) that can be used to classify the four different bisyllabic stress patterns: strong-weak (SW), weak-strong (WS), strong-strong (SS) and weak-weak (WW). We derive input features for the DNN from the duration, pitch, intensity and spectral energy on each of the two consecutive syllables. Using these features, we achieve 93{\%} correct classification between SW/WS stress patterns and 88{\%} correct classification of the four bisyllabic patterns on speech from typically developing children, while we obtain 73.4{\%} classification between SW/WS in disordered speech. These figures represent a two-fold reduction in error rates compared to our prior work, which used a DNN with differential features from consecutive syllables.",
keywords = "automated speech therapy, deep neural network, lexical stress, prosody",
author = "Mostafa Shahin and Ricardo Gutierrez-Osuna and Beena Ahmed",
year = "2016",
month = "5",
day = "18",
doi = "10.1109/ICASSP.2016.7472925",
language = "English",
volume = "2016-May",
pages = "6480--6484",
booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Classification of bisyllabic lexical stress patterns in disordered speech using deep learning

AU - Shahin, Mostafa

AU - Gutierrez-Osuna, Ricardo

AU - Ahmed, Beena

PY - 2016/5/18

Y1 - 2016/5/18

N2 - Technology-based therapy tools can be of great benefit to children with developmental speech disabilities as they typically require sustained practice with a speech therapist for several years. Towards this aim, over the past 4 years we have developed speech processing tools to automatically detect common errors in disordered speech. This paper presents an automated technique to identify incorrect lexical stress. Specifically, we describe a deep neural network (DNN) that can be used to classify the four different bisyllabic stress patterns: strong-weak (SW), weak-strong (WS), strong-strong (SS) and weak-weak (WW). We derive input features for the DNN from the duration, pitch, intensity and spectral energy on each of the two consecutive syllables. Using these features, we achieve 93% correct classification between SW/WS stress patterns and 88% correct classification of the four bisyllabic patterns on speech from typically developing children, while we obtain 73.4% classification between SW/WS in disordered speech. These figures represent a two-fold reduction in error rates compared to our prior work, which used a DNN with differential features from consecutive syllables.

AB - Technology-based therapy tools can be of great benefit to children with developmental speech disabilities as they typically require sustained practice with a speech therapist for several years. Towards this aim, over the past 4 years we have developed speech processing tools to automatically detect common errors in disordered speech. This paper presents an automated technique to identify incorrect lexical stress. Specifically, we describe a deep neural network (DNN) that can be used to classify the four different bisyllabic stress patterns: strong-weak (SW), weak-strong (WS), strong-strong (SS) and weak-weak (WW). We derive input features for the DNN from the duration, pitch, intensity and spectral energy on each of the two consecutive syllables. Using these features, we achieve 93% correct classification between SW/WS stress patterns and 88% correct classification of the four bisyllabic patterns on speech from typically developing children, while we obtain 73.4% classification between SW/WS in disordered speech. These figures represent a two-fold reduction in error rates compared to our prior work, which used a DNN with differential features from consecutive syllables.

KW - automated speech therapy

KW - deep neural network

KW - lexical stress

KW - prosody

UR - http://www.scopus.com/inward/record.url?scp=84973375652&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973375652&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7472925

DO - 10.1109/ICASSP.2016.7472925

M3 - Conference contribution

VL - 2016-May

SP - 6480

EP - 6484

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -