Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base

Miguel García, Jesús Giménez, Lluís Márquez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 10th International Conference, CICLing 2009, Proceedings
Pages306-317
Number of pages12
DOIs
Publication statusPublished - 21 Jul 2009
Event10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 - Mexico City, Mexico
Duration: 1 Mar 20097 Mar 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5449 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
CountryMexico
CityMexico City
Period1/3/097/3/09

    Fingerprint

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

García, M., Giménez, J., & Márquez, L. (2009). Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base. In Computational Linguistics and Intelligent Text Processing - 10th International Conference, CICLing 2009, Proceedings (pp. 306-317). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5449 LNCS). https://doi.org/10.1007/978-3-642-00382-0_25