Parameter optimization for statistical machine translation

It pays to learn from hard examples

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.

Original languageEnglish
Title of host publicationInternational Conference Recent Advances in Natural Language Processing, RANLP
Pages504-510
Number of pages7
Publication statusPublished - 2013
Event9th International Conference on Recent Advances in Natural Language Processing, RANLP 2013 - Hissar, Bulgaria
Duration: 9 Sep 201311 Sep 2013

Other

Other9th International Conference on Recent Advances in Natural Language Processing, RANLP 2013
CountryBulgaria
CityHissar
Period9/9/1311/9/13

Fingerprint

Surface mount technology

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Nakov, P., Khalid Al Obaidli, F., Guzmán, F., & Vogel, S. (2013). Parameter optimization for statistical machine translation: It pays to learn from hard examples. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 504-510)

Parameter optimization for statistical machine translation : It pays to learn from hard examples. / Nakov, Preslav; Khalid Al Obaidli, Fahad; Guzmán, Francisco; Vogel, Stephan.

International Conference Recent Advances in Natural Language Processing, RANLP. 2013. p. 504-510.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakov, P, Khalid Al Obaidli, F, Guzmán, F & Vogel, S 2013, Parameter optimization for statistical machine translation: It pays to learn from hard examples. in International Conference Recent Advances in Natural Language Processing, RANLP. pp. 504-510, 9th International Conference on Recent Advances in Natural Language Processing, RANLP 2013, Hissar, Bulgaria, 9/9/13.
Nakov P, Khalid Al Obaidli F, Guzmán F, Vogel S. Parameter optimization for statistical machine translation: It pays to learn from hard examples. In International Conference Recent Advances in Natural Language Processing, RANLP. 2013. p. 504-510
Nakov, Preslav ; Khalid Al Obaidli, Fahad ; Guzmán, Francisco ; Vogel, Stephan. / Parameter optimization for statistical machine translation : It pays to learn from hard examples. International Conference Recent Advances in Natural Language Processing, RANLP. 2013. pp. 504-510
@inproceedings{5d86a7fc744e4713922a423b978c2bf0,
title = "Parameter optimization for statistical machine translation: It pays to learn from hard examples",
abstract = "Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.",
author = "Preslav Nakov and {Khalid Al Obaidli}, Fahad and Francisco Guzm{\'a}n and Stephan Vogel",
year = "2013",
language = "English",
pages = "504--510",
booktitle = "International Conference Recent Advances in Natural Language Processing, RANLP",

}

TY - GEN

T1 - Parameter optimization for statistical machine translation

T2 - It pays to learn from hard examples

AU - Nakov, Preslav

AU - Khalid Al Obaidli, Fahad

AU - Guzmán, Francisco

AU - Vogel, Stephan

PY - 2013

Y1 - 2013

N2 - Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.

AB - Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.

UR - http://www.scopus.com/inward/record.url?scp=84890539737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890539737&partnerID=8YFLogxK

M3 - Conference contribution

SP - 504

EP - 510

BT - International Conference Recent Advances in Natural Language Processing, RANLP

ER -