Improving egyptian-to-English SMT by mapping egyptian into MSA

Nadir Durrani, Yaser Al-Onaizan, Abraham Ittycheriah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages271-282
Number of pages12
Volume8404 LNCS
EditionPART 2
ISBN (Print)9783642549021
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014 - Kathmandu, Nepal
Duration: 6 Apr 201412 Apr 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume8404 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014
CountryNepal
CityKathmandu
Period6/4/1412/4/14

Fingerprint

Blogs
Surface mount technology
Convolution
Pivot
Baseline
Unknown
Evaluate
Model

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Durrani, N., Al-Onaizan, Y., & Ittycheriah, A. (2014). Improving egyptian-to-English SMT by mapping egyptian into MSA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 2 ed., Vol. 8404 LNCS, pp. 271-282). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8404 LNCS, No. PART 2). Springer Verlag. https://doi.org/10.1007/978-3-642-54903-8_23

Improving egyptian-to-English SMT by mapping egyptian into MSA. / Durrani, Nadir; Al-Onaizan, Yaser; Ittycheriah, Abraham.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8404 LNCS PART 2. ed. Springer Verlag, 2014. p. 271-282 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8404 LNCS, No. PART 2).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Durrani, N, Al-Onaizan, Y & Ittycheriah, A 2014, Improving egyptian-to-English SMT by mapping egyptian into MSA. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 edn, vol. 8404 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 2, vol. 8404 LNCS, Springer Verlag, pp. 271-282, 15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014, Kathmandu, Nepal, 6/4/14. https://doi.org/10.1007/978-3-642-54903-8_23
Durrani N, Al-Onaizan Y, Ittycheriah A. Improving egyptian-to-English SMT by mapping egyptian into MSA. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 2 ed. Vol. 8404 LNCS. Springer Verlag. 2014. p. 271-282. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2). https://doi.org/10.1007/978-3-642-54903-8_23
Durrani, Nadir ; Al-Onaizan, Yaser ; Ittycheriah, Abraham. / Improving egyptian-to-English SMT by mapping egyptian into MSA. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 8404 LNCS PART 2. ed. Springer Verlag, 2014. pp. 271-282 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 2).
@inproceedings{d1055fa4856f4db0b1c2c5fb6bff1667,
title = "Improving egyptian-to-English SMT by mapping egyptian into MSA",
abstract = "One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.",
author = "Nadir Durrani and Yaser Al-Onaizan and Abraham Ittycheriah",
year = "2014",
doi = "10.1007/978-3-642-54903-8_23",
language = "English",
isbn = "9783642549021",
volume = "8404 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
number = "PART 2",
pages = "271--282",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 2",

}

TY - GEN

T1 - Improving egyptian-to-English SMT by mapping egyptian into MSA

AU - Durrani, Nadir

AU - Al-Onaizan, Yaser

AU - Ittycheriah, Abraham

PY - 2014

Y1 - 2014

N2 - One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.

AB - One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.

UR - http://www.scopus.com/inward/record.url?scp=84958552279&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958552279&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-54903-8_23

DO - 10.1007/978-3-642-54903-8_23

M3 - Conference contribution

AN - SCOPUS:84958552279

SN - 9783642549021

VL - 8404 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 271

EP - 282

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -