Learning from relatives: Unified dialectal Arabic segmentation

Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Arabic dialects do not just share a common koiné, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.

Original languageEnglish
Title of host publicationCoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages432-441
Number of pages10
ISBN (Electronic)9781945626548
Publication statusPublished - 1 Jan 2017
Event21st Conference on Computational Natural Language Learning, CoNLL 2017 - Vancouver, Canada
Duration: 3 Aug 20174 Aug 2017

Publication series

NameCoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings

Conference

Conference21st Conference on Computational Natural Language Learning, CoNLL 2017
CountryCanada
CityVancouver
Period3/8/174/8/17

    Fingerprint

ASJC Scopus subject areas

  • Linguistics and Language
  • Artificial Intelligence
  • Human-Computer Interaction

Cite this

Samih, Y., Eldesouki, M., Attia, M., Darwish, K., Abdelali, A., Mubarak, H., & Kallmeyer, L. (2017). Learning from relatives: Unified dialectal Arabic segmentation. In CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings (pp. 432-441). (CoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings). Association for Computational Linguistics (ACL).