Robust estimation of feature weights in Statistical Machine Translation

Cristina España-Bonet, Lluis Marques

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Weights of the various components in a standard Statistical Machine Translation model are usually estimated via Minimum Error Rate Training. With this, one finds their optimum value on a development set with the expectation that these optimal weights generalise well to other test sets. However, this is not always the case when domains differ. This work uses a perceptron algorithm to learn more robust weights to be used on out-of-domain corpora without the need for specialised data. For an Arabic-to-English translation system, the generalisation of weights represents an improvement of more than 2 points of BLEU with respect to the MERT baseline using the same information.

Original languageEnglish
Title of host publicationEAMT 2010 - 14th Annual Conference of the European Association for Machine Translation
Publication statusPublished - 1 Dec 2010
Externally publishedYes
Event14th Annual Conference of the European Association for Machine Translation, EAMT 2010 - Saint-Raphael, France
Duration: 27 May 201028 May 2010

Other

Other14th Annual Conference of the European Association for Machine Translation, EAMT 2010
CountryFrance
CitySaint-Raphael
Period27/5/1028/5/10

Fingerprint

Neural networks
Statistical Machine Translation
English Translation
Translation System

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Software

Cite this

España-Bonet, C., & Marques, L. (2010). Robust estimation of feature weights in Statistical Machine Translation. In EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation

Robust estimation of feature weights in Statistical Machine Translation. / España-Bonet, Cristina; Marques, Lluis.

EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation. 2010.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

España-Bonet, C & Marques, L 2010, Robust estimation of feature weights in Statistical Machine Translation. in EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation. 14th Annual Conference of the European Association for Machine Translation, EAMT 2010, Saint-Raphael, France, 27/5/10.
España-Bonet C, Marques L. Robust estimation of feature weights in Statistical Machine Translation. In EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation. 2010
España-Bonet, Cristina ; Marques, Lluis. / Robust estimation of feature weights in Statistical Machine Translation. EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation. 2010.
@inproceedings{36a11720d78f48a8bd64a114ae5a6a55,
title = "Robust estimation of feature weights in Statistical Machine Translation",
abstract = "Weights of the various components in a standard Statistical Machine Translation model are usually estimated via Minimum Error Rate Training. With this, one finds their optimum value on a development set with the expectation that these optimal weights generalise well to other test sets. However, this is not always the case when domains differ. This work uses a perceptron algorithm to learn more robust weights to be used on out-of-domain corpora without the need for specialised data. For an Arabic-to-English translation system, the generalisation of weights represents an improvement of more than 2 points of BLEU with respect to the MERT baseline using the same information.",
author = "Cristina Espa{\~n}a-Bonet and Lluis Marques",
year = "2010",
month = "12",
day = "1",
language = "English",
booktitle = "EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation",

}

TY - GEN

T1 - Robust estimation of feature weights in Statistical Machine Translation

AU - España-Bonet, Cristina

AU - Marques, Lluis

PY - 2010/12/1

Y1 - 2010/12/1

N2 - Weights of the various components in a standard Statistical Machine Translation model are usually estimated via Minimum Error Rate Training. With this, one finds their optimum value on a development set with the expectation that these optimal weights generalise well to other test sets. However, this is not always the case when domains differ. This work uses a perceptron algorithm to learn more robust weights to be used on out-of-domain corpora without the need for specialised data. For an Arabic-to-English translation system, the generalisation of weights represents an improvement of more than 2 points of BLEU with respect to the MERT baseline using the same information.

AB - Weights of the various components in a standard Statistical Machine Translation model are usually estimated via Minimum Error Rate Training. With this, one finds their optimum value on a development set with the expectation that these optimal weights generalise well to other test sets. However, this is not always the case when domains differ. This work uses a perceptron algorithm to learn more robust weights to be used on out-of-domain corpora without the need for specialised data. For an Arabic-to-English translation system, the generalisation of weights represents an improvement of more than 2 points of BLEU with respect to the MERT baseline using the same information.

UR - http://www.scopus.com/inward/record.url?scp=84857618316&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857618316&partnerID=8YFLogxK

M3 - Conference contribution

BT - EAMT 2010 - 14th Annual Conference of the European Association for Machine Translation

ER -