Automatic classification of lexical stress in English and Arabic languages using deep learning

Mostafa Shahin, Julien Epps, Beena Ahmed

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Prosodic features are important for the intelligibility and proficiency of stress-timed languages such as English and Arabic. Producing the appropriate lexical stress is challenging for second language (L2) learners, in particular, those whose first language (L1) is a syllable-timed language such as Spanish, French, etc. In this paper we introduce a method for automatic classification of lexical stress to be integrated into computer-aided pronunciation learning (CAPL) tools for L2 learning. We trained two different deep learning architectures, the deep feedforward neural network (DNN) and the deep convolutional neural network (CNN) using a set of temporal and spectral features related to the intensity, duration, pitch and energies in different frequency bands. The system was applied on both English (kids and adult) and Arabic (adult) speech corpora collected from native speakers. Our method results in error rates of 9%, 7% and 18% when tested on the English children corpus, English adult corpus and Arabic adult corpus respectively.

Original languageEnglish
Pages (from-to)175-179
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
Publication statusPublished - 2016

Fingerprint

Feedforward neural networks
Frequency bands
Feedforward Neural Networks
Neural networks
Error Rate
Neural Networks
Corpus
Language
Learning
Deep learning
Lexical Stress
Arabic Language
Energy
Children
Speech
Architecture
Intelligibility
Proficiency
L2 Learning
L2 Learners

Keywords

  • Arabic lexical stress
  • Convolutional neural network
  • Deep neural network
  • Lexical stress detection

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{eacbb444473a43ca931dd83e30521dbf,
title = "Automatic classification of lexical stress in English and Arabic languages using deep learning",
abstract = "Prosodic features are important for the intelligibility and proficiency of stress-timed languages such as English and Arabic. Producing the appropriate lexical stress is challenging for second language (L2) learners, in particular, those whose first language (L1) is a syllable-timed language such as Spanish, French, etc. In this paper we introduce a method for automatic classification of lexical stress to be integrated into computer-aided pronunciation learning (CAPL) tools for L2 learning. We trained two different deep learning architectures, the deep feedforward neural network (DNN) and the deep convolutional neural network (CNN) using a set of temporal and spectral features related to the intensity, duration, pitch and energies in different frequency bands. The system was applied on both English (kids and adult) and Arabic (adult) speech corpora collected from native speakers. Our method results in error rates of 9{\%}, 7{\%} and 18{\%} when tested on the English children corpus, English adult corpus and Arabic adult corpus respectively.",
keywords = "Arabic lexical stress, Convolutional neural network, Deep neural network, Lexical stress detection",
author = "Mostafa Shahin and Julien Epps and Beena Ahmed",
year = "2016",
doi = "10.21437/Interspeech.2016-644",
language = "English",
volume = "08-12-September-2016",
pages = "175--179",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Automatic classification of lexical stress in English and Arabic languages using deep learning

AU - Shahin, Mostafa

AU - Epps, Julien

AU - Ahmed, Beena

PY - 2016

Y1 - 2016

N2 - Prosodic features are important for the intelligibility and proficiency of stress-timed languages such as English and Arabic. Producing the appropriate lexical stress is challenging for second language (L2) learners, in particular, those whose first language (L1) is a syllable-timed language such as Spanish, French, etc. In this paper we introduce a method for automatic classification of lexical stress to be integrated into computer-aided pronunciation learning (CAPL) tools for L2 learning. We trained two different deep learning architectures, the deep feedforward neural network (DNN) and the deep convolutional neural network (CNN) using a set of temporal and spectral features related to the intensity, duration, pitch and energies in different frequency bands. The system was applied on both English (kids and adult) and Arabic (adult) speech corpora collected from native speakers. Our method results in error rates of 9%, 7% and 18% when tested on the English children corpus, English adult corpus and Arabic adult corpus respectively.

AB - Prosodic features are important for the intelligibility and proficiency of stress-timed languages such as English and Arabic. Producing the appropriate lexical stress is challenging for second language (L2) learners, in particular, those whose first language (L1) is a syllable-timed language such as Spanish, French, etc. In this paper we introduce a method for automatic classification of lexical stress to be integrated into computer-aided pronunciation learning (CAPL) tools for L2 learning. We trained two different deep learning architectures, the deep feedforward neural network (DNN) and the deep convolutional neural network (CNN) using a set of temporal and spectral features related to the intensity, duration, pitch and energies in different frequency bands. The system was applied on both English (kids and adult) and Arabic (adult) speech corpora collected from native speakers. Our method results in error rates of 9%, 7% and 18% when tested on the English children corpus, English adult corpus and Arabic adult corpus respectively.

KW - Arabic lexical stress

KW - Convolutional neural network

KW - Deep neural network

KW - Lexical stress detection

UR - http://www.scopus.com/inward/record.url?scp=84994246152&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994246152&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-644

DO - 10.21437/Interspeech.2016-644

M3 - Article

VL - 08-12-September-2016

SP - 175

EP - 179

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -