Source language adaptation for resource-poor machine translation

Pidong Wang, Preslav Nakov, Hwee Tou Ng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

We propose a novel, language-independent approach for improving machine translation from a resource-poor language to X by adapting a large bi-text for a related resource-rich language and X (the same target language). We assume a small bi-text for the resource-poor language to X pair, which we use to learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language; we then adapt the former to get closer to the latter. Our experiments for Indonesian/Malay-English translation show that using the large adapted resource-rich bi-text yields 6.7 BLEU points of improvement over the unadapted one and 2.6 BLEU points over the original small bi-text. Moreover, combining the small bi-text with the adapted bi-text outperforms the corresponding combinations with the unadapted bi-text by 1.5-3 BLEU points. We also demonstrate applicability to other languages and domains.

Original languageEnglish
Title of host publicationEMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference
Pages286-296
Number of pages11
Publication statusPublished - 1 Dec 2012
Event2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012 - Jeju Island, Korea, Republic of
Duration: 12 Jul 201214 Jul 2012

Other

Other2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012
CountryKorea, Republic of
CityJeju Island
Period12/7/1214/7/12

Fingerprint

Experiments

ASJC Scopus subject areas

  • Software

Cite this

Wang, P., Nakov, P., & Ng, H. T. (2012). Source language adaptation for resource-poor machine translation. In EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference (pp. 286-296)

Source language adaptation for resource-poor machine translation. / Wang, Pidong; Nakov, Preslav; Ng, Hwee Tou.

EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. 2012. p. 286-296.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, P, Nakov, P & Ng, HT 2012, Source language adaptation for resource-poor machine translation. in EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. pp. 286-296, 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, Jeju Island, Korea, Republic of, 12/7/12.
Wang P, Nakov P, Ng HT. Source language adaptation for resource-poor machine translation. In EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. 2012. p. 286-296
Wang, Pidong ; Nakov, Preslav ; Ng, Hwee Tou. / Source language adaptation for resource-poor machine translation. EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference. 2012. pp. 286-296
@inproceedings{e54bea3271004292a743ffc9b6467fc7,
title = "Source language adaptation for resource-poor machine translation",
abstract = "We propose a novel, language-independent approach for improving machine translation from a resource-poor language to X by adapting a large bi-text for a related resource-rich language and X (the same target language). We assume a small bi-text for the resource-poor language to X pair, which we use to learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language; we then adapt the former to get closer to the latter. Our experiments for Indonesian/Malay-English translation show that using the large adapted resource-rich bi-text yields 6.7 BLEU points of improvement over the unadapted one and 2.6 BLEU points over the original small bi-text. Moreover, combining the small bi-text with the adapted bi-text outperforms the corresponding combinations with the unadapted bi-text by 1.5-3 BLEU points. We also demonstrate applicability to other languages and domains.",
author = "Pidong Wang and Preslav Nakov and Ng, {Hwee Tou}",
year = "2012",
month = "12",
day = "1",
language = "English",
isbn = "9781937284435",
pages = "286--296",
booktitle = "EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference",

}

TY - GEN

T1 - Source language adaptation for resource-poor machine translation

AU - Wang, Pidong

AU - Nakov, Preslav

AU - Ng, Hwee Tou

PY - 2012/12/1

Y1 - 2012/12/1

N2 - We propose a novel, language-independent approach for improving machine translation from a resource-poor language to X by adapting a large bi-text for a related resource-rich language and X (the same target language). We assume a small bi-text for the resource-poor language to X pair, which we use to learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language; we then adapt the former to get closer to the latter. Our experiments for Indonesian/Malay-English translation show that using the large adapted resource-rich bi-text yields 6.7 BLEU points of improvement over the unadapted one and 2.6 BLEU points over the original small bi-text. Moreover, combining the small bi-text with the adapted bi-text outperforms the corresponding combinations with the unadapted bi-text by 1.5-3 BLEU points. We also demonstrate applicability to other languages and domains.

AB - We propose a novel, language-independent approach for improving machine translation from a resource-poor language to X by adapting a large bi-text for a related resource-rich language and X (the same target language). We assume a small bi-text for the resource-poor language to X pair, which we use to learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language; we then adapt the former to get closer to the latter. Our experiments for Indonesian/Malay-English translation show that using the large adapted resource-rich bi-text yields 6.7 BLEU points of improvement over the unadapted one and 2.6 BLEU points over the original small bi-text. Moreover, combining the small bi-text with the adapted bi-text outperforms the corresponding combinations with the unadapted bi-text by 1.5-3 BLEU points. We also demonstrate applicability to other languages and domains.

UR - http://www.scopus.com/inward/record.url?scp=84883422036&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883422036&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84883422036

SN - 9781937284435

SP - 286

EP - 296

BT - EMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference

ER -