Spoken Arabic Algerian dialect identification

Soumia Bougrine, Hadda Cherroun, Ahmed Abdelali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Dialect identification is a challenging task and this becomes more complicated when dealing with under-resourced dialects. In this paper, we propose a system based on prosodic speech information, namely intonation and rhythm for identification of Intra-country dialects. The speech features are extracted after a coarse-grained consonant/vowel segmentation. Dialect models are built using both Deep Neural Networks (DNNs) and SVM. The hyper-parameters for the DNNs topology are tuned using a genetic algorithm. Our framework is implemented and evaluated on KALAM'DZ, a Web-based corpus dedicated to Algerian Arabic Dialectal varieties, with more than 42 h encompassing the four major Algerian subdialects: Hilali, Su-laymite, Ma'qilian, and Algiers-blanks. The results show that the DNNs implementation of Algerian Arabic Dialect IDentification system (a2did) reaches the same results when compared to SVM modeling. In addition, we concluded that a contrastive baseline acoustic-based classification system can serve as a complementary system to our a2did. The overall results reveal the suitability of our prosody-based a2did for speaker-independent dialect identification when utterances size are short. A requirement for real-time applications.

Original languageEnglish
Title of host publication2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538645437
DOIs
Publication statusPublished - 6 Jun 2018
Event2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018 - Algiers, Algeria
Duration: 25 Apr 201826 Apr 2018

Other

Other2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018
CountryAlgeria
CityAlgiers
Period25/4/1826/4/18

Fingerprint

Algerian
dialect
neural network
Identification (control systems)
Genetic algorithms
Acoustics
Topology
acoustics
Deep neural networks

Keywords

  • Algerian dialects
  • Deep Neural Networks
  • Dialect Identification
  • Prosody

ASJC Scopus subject areas

  • Linguistics and Language
  • Communication
  • Artificial Intelligence
  • Signal Processing

Cite this

Bougrine, S., Cherroun, H., & Abdelali, A. (2018). Spoken Arabic Algerian dialect identification. In 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018 (pp. 1-6). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICNLSP.2018.8374383

Spoken Arabic Algerian dialect identification. / Bougrine, Soumia; Cherroun, Hadda; Abdelali, Ahmed.

2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-6.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bougrine, S, Cherroun, H & Abdelali, A 2018, Spoken Arabic Algerian dialect identification. in 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018. Institute of Electrical and Electronics Engineers Inc., pp. 1-6, 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018, Algiers, Algeria, 25/4/18. https://doi.org/10.1109/ICNLSP.2018.8374383
Bougrine S, Cherroun H, Abdelali A. Spoken Arabic Algerian dialect identification. In 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-6 https://doi.org/10.1109/ICNLSP.2018.8374383
Bougrine, Soumia ; Cherroun, Hadda ; Abdelali, Ahmed. / Spoken Arabic Algerian dialect identification. 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1-6
@inproceedings{f5bcfe5706cf4410948927d5d8c77935,
title = "Spoken Arabic Algerian dialect identification",
abstract = "Dialect identification is a challenging task and this becomes more complicated when dealing with under-resourced dialects. In this paper, we propose a system based on prosodic speech information, namely intonation and rhythm for identification of Intra-country dialects. The speech features are extracted after a coarse-grained consonant/vowel segmentation. Dialect models are built using both Deep Neural Networks (DNNs) and SVM. The hyper-parameters for the DNNs topology are tuned using a genetic algorithm. Our framework is implemented and evaluated on KALAM'DZ, a Web-based corpus dedicated to Algerian Arabic Dialectal varieties, with more than 42 h encompassing the four major Algerian subdialects: Hilali, Su-laymite, Ma'qilian, and Algiers-blanks. The results show that the DNNs implementation of Algerian Arabic Dialect IDentification system (a2did) reaches the same results when compared to SVM modeling. In addition, we concluded that a contrastive baseline acoustic-based classification system can serve as a complementary system to our a2did. The overall results reveal the suitability of our prosody-based a2did for speaker-independent dialect identification when utterances size are short. A requirement for real-time applications.",
keywords = "Algerian dialects, Deep Neural Networks, Dialect Identification, Prosody",
author = "Soumia Bougrine and Hadda Cherroun and Ahmed Abdelali",
year = "2018",
month = "6",
day = "6",
doi = "10.1109/ICNLSP.2018.8374383",
language = "English",
pages = "1--6",
booktitle = "2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Spoken Arabic Algerian dialect identification

AU - Bougrine, Soumia

AU - Cherroun, Hadda

AU - Abdelali, Ahmed

PY - 2018/6/6

Y1 - 2018/6/6

N2 - Dialect identification is a challenging task and this becomes more complicated when dealing with under-resourced dialects. In this paper, we propose a system based on prosodic speech information, namely intonation and rhythm for identification of Intra-country dialects. The speech features are extracted after a coarse-grained consonant/vowel segmentation. Dialect models are built using both Deep Neural Networks (DNNs) and SVM. The hyper-parameters for the DNNs topology are tuned using a genetic algorithm. Our framework is implemented and evaluated on KALAM'DZ, a Web-based corpus dedicated to Algerian Arabic Dialectal varieties, with more than 42 h encompassing the four major Algerian subdialects: Hilali, Su-laymite, Ma'qilian, and Algiers-blanks. The results show that the DNNs implementation of Algerian Arabic Dialect IDentification system (a2did) reaches the same results when compared to SVM modeling. In addition, we concluded that a contrastive baseline acoustic-based classification system can serve as a complementary system to our a2did. The overall results reveal the suitability of our prosody-based a2did for speaker-independent dialect identification when utterances size are short. A requirement for real-time applications.

AB - Dialect identification is a challenging task and this becomes more complicated when dealing with under-resourced dialects. In this paper, we propose a system based on prosodic speech information, namely intonation and rhythm for identification of Intra-country dialects. The speech features are extracted after a coarse-grained consonant/vowel segmentation. Dialect models are built using both Deep Neural Networks (DNNs) and SVM. The hyper-parameters for the DNNs topology are tuned using a genetic algorithm. Our framework is implemented and evaluated on KALAM'DZ, a Web-based corpus dedicated to Algerian Arabic Dialectal varieties, with more than 42 h encompassing the four major Algerian subdialects: Hilali, Su-laymite, Ma'qilian, and Algiers-blanks. The results show that the DNNs implementation of Algerian Arabic Dialect IDentification system (a2did) reaches the same results when compared to SVM modeling. In addition, we concluded that a contrastive baseline acoustic-based classification system can serve as a complementary system to our a2did. The overall results reveal the suitability of our prosody-based a2did for speaker-independent dialect identification when utterances size are short. A requirement for real-time applications.

KW - Algerian dialects

KW - Deep Neural Networks

KW - Dialect Identification

KW - Prosody

UR - http://www.scopus.com/inward/record.url?scp=85049369225&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049369225&partnerID=8YFLogxK

U2 - 10.1109/ICNLSP.2018.8374383

DO - 10.1109/ICNLSP.2018.8374383

M3 - Conference contribution

SP - 1

EP - 6

BT - 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -