Transforming standard arabic to colloquial arabic

Emad Mohamed, Behrang Mohit, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.

Original languageEnglish
Title of host publication50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
Pages176-180
Number of pages5
Publication statusPublished - 1 Dec 2012
Event50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Jeju Island, Korea, Republic of
Duration: 8 Jul 201214 Jul 2012

Publication series

Name50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
Volume2

Other

Other50th Annual Meeting of the Association for Computational Linguistics, ACL 2012
CountryKorea, Republic of
CityJeju Island
Period8/7/1214/7/12

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Cite this

Mohamed, E., Mohit, B., & Oflazer, K. (2012). Transforming standard arabic to colloquial arabic. In 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference (pp. 176-180). (50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference; Vol. 2).