Leveraging online user feedback to improve statistical machine translation

Lluís Formiga, Alberto Barron, Lluis Marques, Carlos A. Henríquez, José B. Mariño

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.

Original languageEnglish
Pages (from-to)159-192
Number of pages34
JournalJournal of Artificial Intelligence Research
Volume54
Publication statusPublished - 15 Sep 2015

Fingerprint

Feedback

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Leveraging online user feedback to improve statistical machine translation. / Formiga, Lluís; Barron, Alberto; Marques, Lluis; Henríquez, Carlos A.; Mariño, José B.

In: Journal of Artificial Intelligence Research, Vol. 54, 15.09.2015, p. 159-192.

Research output: Contribution to journalArticle

@article{9e4892d8f9c149418bc91228d07fd665,
title = "Leveraging online user feedback to improve statistical machine translation",
abstract = "In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.",
author = "Llu{\'i}s Formiga and Alberto Barron and Lluis Marques and Henr{\'i}quez, {Carlos A.} and Mari{\~n}o, {Jos{\'e} B.}",
year = "2015",
month = "9",
day = "15",
language = "English",
volume = "54",
pages = "159--192",
journal = "Journal of Artificial Intelligence Research",
issn = "1076-9757",
publisher = "Morgan Kaufmann Publishers, Inc.",

}

TY - JOUR

T1 - Leveraging online user feedback to improve statistical machine translation

AU - Formiga, Lluís

AU - Barron, Alberto

AU - Marques, Lluis

AU - Henríquez, Carlos A.

AU - Mariño, José B.

PY - 2015/9/15

Y1 - 2015/9/15

N2 - In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.

AB - In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.

UR - http://www.scopus.com/inward/record.url?scp=84943760273&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84943760273&partnerID=8YFLogxK

M3 - Article

VL - 54

SP - 159

EP - 192

JO - Journal of Artificial Intelligence Research

JF - Journal of Artificial Intelligence Research

SN - 1076-9757

ER -