Context-based Arabic morphological analysis for machine translation

Thuy Linh Nguyen, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

In this paper, we present a novel morphology preprocessing technique for Arabic- English translation. We exploit the Arabic morphology-English alignment to learn a model removing nonaligned Arabic morphemes. The model is an instance of the Conditional Random Field (Lafferty et al., 2001) model; it deletes a morpheme based on the morpheme's context. We achieved around two BLEU points improvement over the original Arabic translation for both a travel-domain system trained on 20K sentence pairs and a news domain system trained on 177K sentence pairs, and showed a potential improvement for a large-scale SMT system trained on 5 million sentence pairs.

Original languageEnglish
Title of host publicationCoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning
Pages135-142
Number of pages8
Publication statusPublished - 1 Dec 2008
Event12th Conference on Computational Natural Language Learning, CoNLL 2008 - Manchester, United Kingdom
Duration: 16 Aug 200817 Aug 2008

Publication series

NameCoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning

Other

Other12th Conference on Computational Natural Language Learning, CoNLL 2008
CountryUnited Kingdom
CityManchester
Period16/8/0817/8/08

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Cite this

Nguyen, T. L., & Vogel, S. (2008). Context-based Arabic morphological analysis for machine translation. In CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning (pp. 135-142). (CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning).