Evaluating automatic speech recognition for child speech therapy applications

Adam Hair, Kirrie J. Ballard, Beena Ahmed, Ricardo Gutierrez-Osuna

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic speech recognition (ASR) technology can be a useful tool in mobile apps for child speech therapy, empowering children to complete their practice with limited caregiver supervision. However, little is known about the feasibility of performing ASR on mobile devices, particularly when training data is limited. In this study, we investigated the performance of two low-resource ASR systems on disordered speech from children. We compared the open-source PocketSphinx (PS) recognizer using adapted acoustic models and a custom template-matching (TM) recognizer. TM and the adapted models significantly out-perform the default PS model. On average, maximum likelihood linear regression and maximum a posteriori adaptation increased PS accuracy from 59.4% to 63.8% and 80.0%, respectively, suggesting that the models successfully captured speaker-specific word production variations. TM reached a mean accuracy of 75.8%.

Original languageEnglish
Title of host publicationASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility
PublisherAssociation for Computing Machinery, Inc
Pages578-580
Number of pages3
ISBN (Electronic)9781450366762
DOIs
Publication statusPublished - 24 Oct 2019
Event21st International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2019 - Pittsburgh, United States
Duration: 28 Oct 201930 Oct 2019

Publication series

NameASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility

Conference

Conference21st International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2019
CountryUnited States
CityPittsburgh
Period28/10/1930/10/19

Fingerprint

Speech recognition
Template matching
Application programs
Linear regression
Mobile devices
Maximum likelihood
Acoustics

Keywords

  • Assistive Technology
  • Computer-Assisted Pronunciation Training (CAPT)

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Hair, A., Ballard, K. J., Ahmed, B., & Gutierrez-Osuna, R. (2019). Evaluating automatic speech recognition for child speech therapy applications. In ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility (pp. 578-580). (ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility). Association for Computing Machinery, Inc. https://doi.org/10.1145/3308561.3354606

Evaluating automatic speech recognition for child speech therapy applications. / Hair, Adam; Ballard, Kirrie J.; Ahmed, Beena; Gutierrez-Osuna, Ricardo.

ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility. Association for Computing Machinery, Inc, 2019. p. 578-580 (ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hair, A, Ballard, KJ, Ahmed, B & Gutierrez-Osuna, R 2019, Evaluating automatic speech recognition for child speech therapy applications. in ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility. ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility, Association for Computing Machinery, Inc, pp. 578-580, 21st International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2019, Pittsburgh, United States, 28/10/19. https://doi.org/10.1145/3308561.3354606
Hair A, Ballard KJ, Ahmed B, Gutierrez-Osuna R. Evaluating automatic speech recognition for child speech therapy applications. In ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility. Association for Computing Machinery, Inc. 2019. p. 578-580. (ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility). https://doi.org/10.1145/3308561.3354606
Hair, Adam ; Ballard, Kirrie J. ; Ahmed, Beena ; Gutierrez-Osuna, Ricardo. / Evaluating automatic speech recognition for child speech therapy applications. ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility. Association for Computing Machinery, Inc, 2019. pp. 578-580 (ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility).
@inproceedings{9c51dd126dd446d99bd13bd86e405e91,
title = "Evaluating automatic speech recognition for child speech therapy applications",
abstract = "Automatic speech recognition (ASR) technology can be a useful tool in mobile apps for child speech therapy, empowering children to complete their practice with limited caregiver supervision. However, little is known about the feasibility of performing ASR on mobile devices, particularly when training data is limited. In this study, we investigated the performance of two low-resource ASR systems on disordered speech from children. We compared the open-source PocketSphinx (PS) recognizer using adapted acoustic models and a custom template-matching (TM) recognizer. TM and the adapted models significantly out-perform the default PS model. On average, maximum likelihood linear regression and maximum a posteriori adaptation increased PS accuracy from 59.4{\%} to 63.8{\%} and 80.0{\%}, respectively, suggesting that the models successfully captured speaker-specific word production variations. TM reached a mean accuracy of 75.8{\%}.",
keywords = "Assistive Technology, Computer-Assisted Pronunciation Training (CAPT)",
author = "Adam Hair and Ballard, {Kirrie J.} and Beena Ahmed and Ricardo Gutierrez-Osuna",
year = "2019",
month = "10",
day = "24",
doi = "10.1145/3308561.3354606",
language = "English",
series = "ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility",
publisher = "Association for Computing Machinery, Inc",
pages = "578--580",
booktitle = "ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility",

}

TY - GEN

T1 - Evaluating automatic speech recognition for child speech therapy applications

AU - Hair, Adam

AU - Ballard, Kirrie J.

AU - Ahmed, Beena

AU - Gutierrez-Osuna, Ricardo

PY - 2019/10/24

Y1 - 2019/10/24

N2 - Automatic speech recognition (ASR) technology can be a useful tool in mobile apps for child speech therapy, empowering children to complete their practice with limited caregiver supervision. However, little is known about the feasibility of performing ASR on mobile devices, particularly when training data is limited. In this study, we investigated the performance of two low-resource ASR systems on disordered speech from children. We compared the open-source PocketSphinx (PS) recognizer using adapted acoustic models and a custom template-matching (TM) recognizer. TM and the adapted models significantly out-perform the default PS model. On average, maximum likelihood linear regression and maximum a posteriori adaptation increased PS accuracy from 59.4% to 63.8% and 80.0%, respectively, suggesting that the models successfully captured speaker-specific word production variations. TM reached a mean accuracy of 75.8%.

AB - Automatic speech recognition (ASR) technology can be a useful tool in mobile apps for child speech therapy, empowering children to complete their practice with limited caregiver supervision. However, little is known about the feasibility of performing ASR on mobile devices, particularly when training data is limited. In this study, we investigated the performance of two low-resource ASR systems on disordered speech from children. We compared the open-source PocketSphinx (PS) recognizer using adapted acoustic models and a custom template-matching (TM) recognizer. TM and the adapted models significantly out-perform the default PS model. On average, maximum likelihood linear regression and maximum a posteriori adaptation increased PS accuracy from 59.4% to 63.8% and 80.0%, respectively, suggesting that the models successfully captured speaker-specific word production variations. TM reached a mean accuracy of 75.8%.

KW - Assistive Technology

KW - Computer-Assisted Pronunciation Training (CAPT)

UR - http://www.scopus.com/inward/record.url?scp=85074919763&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074919763&partnerID=8YFLogxK

U2 - 10.1145/3308561.3354606

DO - 10.1145/3308561.3354606

M3 - Conference contribution

AN - SCOPUS:85074919763

T3 - ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility

SP - 578

EP - 580

BT - ASSETS 2019 - 21st International ACM SIGACCESS Conference on Computers and Accessibility

PB - Association for Computing Machinery, Inc

ER -