Source language adaptation approaches for resource-poor machine translation

Pidong Wang, Preslav Nakov, Hwee Tou Ng

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.

Original languageEnglish
Pages (from-to)277-306
Number of pages30
JournalComputational Linguistics
Volume42
Issue number2
DOIs
Publication statusPublished - 1 Jun 2016

Fingerprint

language
resources
Experiments
Source Language
Language
Resources
Machine Translation
experiment
Statistical Machine Translation

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Source language adaptation approaches for resource-poor machine translation. / Wang, Pidong; Nakov, Preslav; Ng, Hwee Tou.

In: Computational Linguistics, Vol. 42, No. 2, 01.06.2016, p. 277-306.

Research output: Contribution to journalArticle

@article{57a192c664a5412fbfbd19095dc4440e,
title = "Source language adaptation approaches for resource-poor machine translation",
abstract = "Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.",
author = "Pidong Wang and Preslav Nakov and Ng, {Hwee Tou}",
year = "2016",
month = "6",
day = "1",
doi = "10.1162/COLI_a_00248",
language = "English",
volume = "42",
pages = "277--306",
journal = "Computational Linguistics",
issn = "0891-2017",
publisher = "MIT Press Journals",
number = "2",

}

TY - JOUR

T1 - Source language adaptation approaches for resource-poor machine translation

AU - Wang, Pidong

AU - Nakov, Preslav

AU - Ng, Hwee Tou

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.

AB - Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.

UR - http://www.scopus.com/inward/record.url?scp=84975076437&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84975076437&partnerID=8YFLogxK

U2 - 10.1162/COLI_a_00248

DO - 10.1162/COLI_a_00248

M3 - Article

VL - 42

SP - 277

EP - 306

JO - Computational Linguistics

JF - Computational Linguistics

SN - 0891-2017

IS - 2

ER -