Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base

Miguel García, Jesús Giménez, Lluis Marques

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages306-317
Number of pages12
Volume5449 LNCS
DOIs
Publication statusPublished - 21 Jul 2009
Externally publishedYes
Event10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009 - Mexico City, Mexico
Duration: 1 Mar 20097 Mar 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5449 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009
CountryMexico
CityMexico City
Period1/3/097/3/09

Fingerprint

Knowledge Base
Decoding
Topology
WordNet
Model
Statistical Machine Translation
Large Set
Repository
Statistical Model
Heuristics
Metric
Statistical Models
Evaluation
Modeling

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

García, M., Giménez, J., & Marques, L. (2009). Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5449 LNCS, pp. 306-317). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5449 LNCS). https://doi.org/10.1007/978-3-642-00382-0_25

Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base. / García, Miguel; Giménez, Jesús; Marques, Lluis.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5449 LNCS 2009. p. 306-317 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5449 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

García, M, Giménez, J & Marques, L 2009, Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5449 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5449 LNCS, pp. 306-317, 10th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2009, Mexico City, Mexico, 1/3/09. https://doi.org/10.1007/978-3-642-00382-0_25
García M, Giménez J, Marques L. Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5449 LNCS. 2009. p. 306-317. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-00382-0_25
García, Miguel ; Giménez, Jesús ; Marques, Lluis. / Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5449 LNCS 2009. pp. 306-317 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b78717994bfa4e019c4c319ff57d3875,
title = "Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base",
abstract = "This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.",
author = "Miguel Garc{\'i}a and Jes{\'u}s Gim{\'e}nez and Lluis Marques",
year = "2009",
month = "7",
day = "21",
doi = "10.1007/978-3-642-00382-0_25",
language = "English",
isbn = "3642003818",
volume = "5449 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "306--317",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Enriching Statistical Translation models using a domain-ndependent multilingual lexical knowledge base

AU - García, Miguel

AU - Giménez, Jesús

AU - Marques, Lluis

PY - 2009/7/21

Y1 - 2009/7/21

N2 - This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.

AB - This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.

UR - http://www.scopus.com/inward/record.url?scp=67650529339&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650529339&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-00382-0_25

DO - 10.1007/978-3-642-00382-0_25

M3 - Conference contribution

AN - SCOPUS:67650529339

SN - 3642003818

SN - 9783642003813

VL - 5449 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 306

EP - 317

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -