Context-based Arabic morphological analysis for machine translation

ThuyLinh Nguyen, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

In this paper, we present a novel morphology preprocessing technique for Arabic- English translation. We exploit the Arabic morphology-English alignment to learn a model removing nonaligned Arabic morphemes. The model is an instance of the Conditional Random Field (Lafferty et al., 2001) model; it deletes a morpheme based on the morpheme's context. We achieved around two BLEU points improvement over the original Arabic translation for both a travel-domain system trained on 20K sentence pairs and a news domain system trained on 177K sentence pairs, and showed a potential improvement for a large-scale SMT system trained on 5 million sentence pairs.

Original languageEnglish
Title of host publicationCoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning
Pages135-142
Number of pages8
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event12th Conference on Computational Natural Language Learning, CoNLL 2008 - Manchester, United Kingdom
Duration: 16 Aug 200817 Aug 2008

Other

Other12th Conference on Computational Natural Language Learning, CoNLL 2008
CountryUnited Kingdom
CityManchester
Period16/8/0817/8/08

Fingerprint

Surface mount technology
news
travel

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Linguistics and Language

Cite this

Nguyen, T., & Vogel, S. (2008). Context-based Arabic morphological analysis for machine translation. In CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning (pp. 135-142)

Context-based Arabic morphological analysis for machine translation. / Nguyen, ThuyLinh; Vogel, Stephan.

CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning. 2008. p. 135-142.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nguyen, T & Vogel, S 2008, Context-based Arabic morphological analysis for machine translation. in CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning. pp. 135-142, 12th Conference on Computational Natural Language Learning, CoNLL 2008, Manchester, United Kingdom, 16/8/08.
Nguyen T, Vogel S. Context-based Arabic morphological analysis for machine translation. In CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning. 2008. p. 135-142
Nguyen, ThuyLinh ; Vogel, Stephan. / Context-based Arabic morphological analysis for machine translation. CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning. 2008. pp. 135-142
@inproceedings{68e29d8d31c64695bf75ea7be993c14c,
title = "Context-based Arabic morphological analysis for machine translation",
abstract = "In this paper, we present a novel morphology preprocessing technique for Arabic- English translation. We exploit the Arabic morphology-English alignment to learn a model removing nonaligned Arabic morphemes. The model is an instance of the Conditional Random Field (Lafferty et al., 2001) model; it deletes a morpheme based on the morpheme's context. We achieved around two BLEU points improvement over the original Arabic translation for both a travel-domain system trained on 20K sentence pairs and a news domain system trained on 177K sentence pairs, and showed a potential improvement for a large-scale SMT system trained on 5 million sentence pairs.",
author = "ThuyLinh Nguyen and Stephan Vogel",
year = "2008",
month = "12",
day = "1",
language = "English",
isbn = "1905593481",
pages = "135--142",
booktitle = "CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning",

}

TY - GEN

T1 - Context-based Arabic morphological analysis for machine translation

AU - Nguyen, ThuyLinh

AU - Vogel, Stephan

PY - 2008/12/1

Y1 - 2008/12/1

N2 - In this paper, we present a novel morphology preprocessing technique for Arabic- English translation. We exploit the Arabic morphology-English alignment to learn a model removing nonaligned Arabic morphemes. The model is an instance of the Conditional Random Field (Lafferty et al., 2001) model; it deletes a morpheme based on the morpheme's context. We achieved around two BLEU points improvement over the original Arabic translation for both a travel-domain system trained on 20K sentence pairs and a news domain system trained on 177K sentence pairs, and showed a potential improvement for a large-scale SMT system trained on 5 million sentence pairs.

AB - In this paper, we present a novel morphology preprocessing technique for Arabic- English translation. We exploit the Arabic morphology-English alignment to learn a model removing nonaligned Arabic morphemes. The model is an instance of the Conditional Random Field (Lafferty et al., 2001) model; it deletes a morpheme based on the morpheme's context. We achieved around two BLEU points improvement over the original Arabic translation for both a travel-domain system trained on 20K sentence pairs and a news domain system trained on 177K sentence pairs, and showed a potential improvement for a large-scale SMT system trained on 5 million sentence pairs.

UR - http://www.scopus.com/inward/record.url?scp=80053394217&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053394217&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1905593481

SN - 9781905593484

SP - 135

EP - 142

BT - CoNLL 2008 - Proceedings of the Twelfth Conference on Computational Natural Language Learning

ER -