Transforming standard arabic to colloquial arabic

Emad Mohamed, Behrang Mohit, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.

Original languageEnglish
Title of host publication50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
Pages176-180
Number of pages5
Volume2
Publication statusPublished - 2012
Event50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Jeju Island, Korea, Republic of
Duration: 8 Jul 201214 Jul 2012

Other

Other50th Annual Meeting of the Association for Computational Linguistics, ACL 2012
CountryKorea, Republic of
CityJeju Island
Period8/7/1214/7/12

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Cite this

Mohamed, E., Mohit, B., & Oflazer, K. (2012). Transforming standard arabic to colloquial arabic. In 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference (Vol. 2, pp. 176-180)

Transforming standard arabic to colloquial arabic. / Mohamed, Emad; Mohit, Behrang; Oflazer, Kemal.

50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. Vol. 2 2012. p. 176-180.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mohamed, E, Mohit, B & Oflazer, K 2012, Transforming standard arabic to colloquial arabic. in 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. vol. 2, pp. 176-180, 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012, Jeju Island, Korea, Republic of, 8/7/12.
Mohamed E, Mohit B, Oflazer K. Transforming standard arabic to colloquial arabic. In 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. Vol. 2. 2012. p. 176-180
Mohamed, Emad ; Mohit, Behrang ; Oflazer, Kemal. / Transforming standard arabic to colloquial arabic. 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. Vol. 2 2012. pp. 176-180
@inproceedings{27849fbd2b2242429c32c3ece3beca1c,
title = "Transforming standard arabic to colloquial arabic",
abstract = "We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24{\%} to 86.84{\%} on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98{\%} to 16.66{\%}. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.",
author = "Emad Mohamed and Behrang Mohit and Kemal Oflazer",
year = "2012",
language = "English",
isbn = "9781937284251",
volume = "2",
pages = "176--180",
booktitle = "50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference",

}

TY - GEN

T1 - Transforming standard arabic to colloquial arabic

AU - Mohamed, Emad

AU - Mohit, Behrang

AU - Oflazer, Kemal

PY - 2012

Y1 - 2012

N2 - We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.

AB - We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine translation. It can also considerably speed up the annotation of Arabic dialects.

UR - http://www.scopus.com/inward/record.url?scp=84878175593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878175593&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781937284251

VL - 2

SP - 176

EP - 180

BT - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference

ER -