QMDIS: QCRI-MIT advanced dialect identification system

Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai, Yonatan Belinkov, James Glass

Research output: Contribution to journalConference article

7 Citations (Scopus)

Abstract

As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (DID) for Arabic languages, we present the QCRI-MIT Advanced Dialect Identification System (QMDIS). QMDIS is an automatic spoken DID system for Dialectal Arabic (DA). In this paper, we report a comprehensive study of the three main components used in the spoken DID task: phonotactic, lexical and acoustic. We use Support Vector Machines (SVMs), Logistic Regression (LR) and Convolutional Neural Networks (CNNs) as backend classifiers throughout the study. We perform all our experiments on a publicly available dataset and present new state-of-The-Art results. QMDIS discriminates between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic (MSA).We report ∼ 73% accuracy for system combination. All the data and the code used in our experiments are publicly available for research.

Original languageEnglish
Pages (from-to)2591-2595
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - 1 Jan 2017

    Fingerprint

Keywords

  • Acoustic
  • Arabic
  • Convolutional Neural Network
  • Lexical
  • Logistic Regression
  • Phonotactic
  • Spoken Dialect Identification
  • Support Vector Machine

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this