A hybrid machine translation architecture guided by syntax

Gorka Labaka, Cristina España-Bonet, Lluis Marques, Kepa Sarasola

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

This article presents a hybrid architecture which combines rule-based machine translation (RBMT) with phrase-based statistical machine translation (SMT). The hybrid translation system is guided by the rule-based engine. Before the transfer step, a varied set of partial candidate translations is calculated with the SMT system and used to enrich the tree-based representation with more translation alternatives. The final translation is constructed by choosing the most probable combination among the available fragments using monotone statistical decoding following the order provided by the rule-based system. We apply the hybrid model to a pair of distantly related languages, Spanish and Basque, and perform extensive experimentation on two different corpora. According to our empirical evaluation, the hybrid approach outperforms the best individual system across a varied set of automatic translation evaluation metrics. Following some output analysis to better understand the behaviour of the hybrid system, we explore the possibility of adding alternative parse trees and extra features to the hybrid decoder. Finally, we present a twofold manual evaluation of the translation systems studied in this paper, consisting of (i) a pairwise output comparison and (ii) a individual task-oriented evaluation using HTER. Interestingly, the manual evaluation shows some contradictory results with respect to the automatic evaluation; humans tend to prefer the translations from the RBMT system over the statistical and hybrid translations.

Original languageEnglish
Pages (from-to)91-125
Number of pages35
JournalMachine Translation
Volume28
Issue number2
DOIs
Publication statusPublished - 1 Jan 2014

Fingerprint

Knowledge based systems
Hybrid systems
syntax
Decoding
Engines
evaluation
Machine Translation
Syntax
Evaluation
Spanish language
Basque
candidacy

Keywords

  • Hybrid machine translation
  • Phrase-based statistical MT
  • Rule-based MT
  • Spanish–Basque MT

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Language and Linguistics
  • Linguistics and Language

Cite this

A hybrid machine translation architecture guided by syntax. / Labaka, Gorka; España-Bonet, Cristina; Marques, Lluis; Sarasola, Kepa.

In: Machine Translation, Vol. 28, No. 2, 01.01.2014, p. 91-125.

Research output: Contribution to journalArticle

Labaka, Gorka ; España-Bonet, Cristina ; Marques, Lluis ; Sarasola, Kepa. / A hybrid machine translation architecture guided by syntax. In: Machine Translation. 2014 ; Vol. 28, No. 2. pp. 91-125.
@article{4b5a9d0f80314d6fab777a3423c20925,
title = "A hybrid machine translation architecture guided by syntax",
abstract = "This article presents a hybrid architecture which combines rule-based machine translation (RBMT) with phrase-based statistical machine translation (SMT). The hybrid translation system is guided by the rule-based engine. Before the transfer step, a varied set of partial candidate translations is calculated with the SMT system and used to enrich the tree-based representation with more translation alternatives. The final translation is constructed by choosing the most probable combination among the available fragments using monotone statistical decoding following the order provided by the rule-based system. We apply the hybrid model to a pair of distantly related languages, Spanish and Basque, and perform extensive experimentation on two different corpora. According to our empirical evaluation, the hybrid approach outperforms the best individual system across a varied set of automatic translation evaluation metrics. Following some output analysis to better understand the behaviour of the hybrid system, we explore the possibility of adding alternative parse trees and extra features to the hybrid decoder. Finally, we present a twofold manual evaluation of the translation systems studied in this paper, consisting of (i) a pairwise output comparison and (ii) a individual task-oriented evaluation using HTER. Interestingly, the manual evaluation shows some contradictory results with respect to the automatic evaluation; humans tend to prefer the translations from the RBMT system over the statistical and hybrid translations.",
keywords = "Hybrid machine translation, Phrase-based statistical MT, Rule-based MT, Spanish–Basque MT",
author = "Gorka Labaka and Cristina Espa{\~n}a-Bonet and Lluis Marques and Kepa Sarasola",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/s10590-014-9153-0",
language = "English",
volume = "28",
pages = "91--125",
journal = "Machine Translation",
issn = "0922-6567",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - A hybrid machine translation architecture guided by syntax

AU - Labaka, Gorka

AU - España-Bonet, Cristina

AU - Marques, Lluis

AU - Sarasola, Kepa

PY - 2014/1/1

Y1 - 2014/1/1

N2 - This article presents a hybrid architecture which combines rule-based machine translation (RBMT) with phrase-based statistical machine translation (SMT). The hybrid translation system is guided by the rule-based engine. Before the transfer step, a varied set of partial candidate translations is calculated with the SMT system and used to enrich the tree-based representation with more translation alternatives. The final translation is constructed by choosing the most probable combination among the available fragments using monotone statistical decoding following the order provided by the rule-based system. We apply the hybrid model to a pair of distantly related languages, Spanish and Basque, and perform extensive experimentation on two different corpora. According to our empirical evaluation, the hybrid approach outperforms the best individual system across a varied set of automatic translation evaluation metrics. Following some output analysis to better understand the behaviour of the hybrid system, we explore the possibility of adding alternative parse trees and extra features to the hybrid decoder. Finally, we present a twofold manual evaluation of the translation systems studied in this paper, consisting of (i) a pairwise output comparison and (ii) a individual task-oriented evaluation using HTER. Interestingly, the manual evaluation shows some contradictory results with respect to the automatic evaluation; humans tend to prefer the translations from the RBMT system over the statistical and hybrid translations.

AB - This article presents a hybrid architecture which combines rule-based machine translation (RBMT) with phrase-based statistical machine translation (SMT). The hybrid translation system is guided by the rule-based engine. Before the transfer step, a varied set of partial candidate translations is calculated with the SMT system and used to enrich the tree-based representation with more translation alternatives. The final translation is constructed by choosing the most probable combination among the available fragments using monotone statistical decoding following the order provided by the rule-based system. We apply the hybrid model to a pair of distantly related languages, Spanish and Basque, and perform extensive experimentation on two different corpora. According to our empirical evaluation, the hybrid approach outperforms the best individual system across a varied set of automatic translation evaluation metrics. Following some output analysis to better understand the behaviour of the hybrid system, we explore the possibility of adding alternative parse trees and extra features to the hybrid decoder. Finally, we present a twofold manual evaluation of the translation systems studied in this paper, consisting of (i) a pairwise output comparison and (ii) a individual task-oriented evaluation using HTER. Interestingly, the manual evaluation shows some contradictory results with respect to the automatic evaluation; humans tend to prefer the translations from the RBMT system over the statistical and hybrid translations.

KW - Hybrid machine translation

KW - Phrase-based statistical MT

KW - Rule-based MT

KW - Spanish–Basque MT

UR - http://www.scopus.com/inward/record.url?scp=84911981474&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911981474&partnerID=8YFLogxK

U2 - 10.1007/s10590-014-9153-0

DO - 10.1007/s10590-014-9153-0

M3 - Article

AN - SCOPUS:84911981474

VL - 28

SP - 91

EP - 125

JO - Machine Translation

JF - Machine Translation

SN - 0922-6567

IS - 2

ER -