Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification

Maryam Najafian, Sameer Khurana, Suwon Shan, Ahmed Ali, James Glass

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5174-5178
Number of pages5
Volume2018-April
ISBN (Print)9781538646588
DOIs
Publication statusPublished - 10 Sep 2018
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period15/4/1820/4/18

Fingerprint

Support vector machines
Classifiers
Fusion reactions
Statistics
Neural networks

Keywords

  • CNN
  • Dialect identification
  • Phonotactics

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Najafian, M., Khurana, S., Shan, S., Ali, A., & Glass, J. (2018). Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (Vol. 2018-April, pp. 5174-5178). [8461486] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8461486

Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification. / Najafian, Maryam; Khurana, Sameer; Shan, Suwon; Ali, Ahmed; Glass, James.

2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. p. 5174-5178 8461486.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Najafian, M, Khurana, S, Shan, S, Ali, A & Glass, J 2018, Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. vol. 2018-April, 8461486, Institute of Electrical and Electronics Engineers Inc., pp. 5174-5178, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 15/4/18. https://doi.org/10.1109/ICASSP.2018.8461486
Najafian M, Khurana S, Shan S, Ali A, Glass J. Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April. Institute of Electrical and Electronics Engineers Inc. 2018. p. 5174-5178. 8461486 https://doi.org/10.1109/ICASSP.2018.8461486
Najafian, Maryam ; Khurana, Sameer ; Shan, Suwon ; Ali, Ahmed ; Glass, James. / Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. pp. 5174-5178
@inproceedings{b690ffc4de75424eb9bcb5e649648f65,
title = "Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification",
abstract = "In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7{\%} and 19.0{\%} relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.",
keywords = "CNN, Dialect identification, Phonotactics",
author = "Maryam Najafian and Sameer Khurana and Suwon Shan and Ahmed Ali and James Glass",
year = "2018",
month = "9",
day = "10",
doi = "10.1109/ICASSP.2018.8461486",
language = "English",
isbn = "9781538646588",
volume = "2018-April",
pages = "5174--5178",
booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification

AU - Najafian, Maryam

AU - Khurana, Sameer

AU - Shan, Suwon

AU - Ali, Ahmed

AU - Glass, James

PY - 2018/9/10

Y1 - 2018/9/10

N2 - In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.

AB - In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.

KW - CNN

KW - Dialect identification

KW - Phonotactics

UR - http://www.scopus.com/inward/record.url?scp=85054216672&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054216672&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8461486

DO - 10.1109/ICASSP.2018.8461486

M3 - Conference contribution

SN - 9781538646588

VL - 2018-April

SP - 5174

EP - 5178

BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -