Improved statistical machine translation for resource-poor languages using related resource-rich languages

Preslav Nakov, Hwee Tou Ng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

43 Citations (Scopus)

Abstract

We propose a novel language-independent approach for improving statistical machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X 1-Y and a larger bi-text for X2-Y for some resource-rich language X2 that is closely related to X1. The evaluation for Indonesian→English (using Malay) and Spanish→English (using Portuguese and pretending Spanish is resource-poor) shows an absolute gain of up to 1.35 and 3.37 Bleu points, respectively, which is an improvement over the rivaling approaches, while using much less additional data.

Original languageEnglish
Title of host publicationEMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009
Pages1358-1367
Number of pages10
Publication statusPublished - 1 Dec 2009
Externally publishedYes
Event2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009 - Singapore, Singapore
Duration: 6 Aug 20097 Aug 2009

Other

Other2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009
CountrySingapore
CitySingapore
Period6/8/097/8/09

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Nakov, P., & Ng, H. T. (2009). Improved statistical machine translation for resource-poor languages using related resource-rich languages. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 1358-1367)

Improved statistical machine translation for resource-poor languages using related resource-rich languages. / Nakov, Preslav; Ng, Hwee Tou.

EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. 2009. p. 1358-1367.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakov, P & Ng, HT 2009, Improved statistical machine translation for resource-poor languages using related resource-rich languages. in EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. pp. 1358-1367, 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009, Singapore, Singapore, 6/8/09.
Nakov P, Ng HT. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. 2009. p. 1358-1367
Nakov, Preslav ; Ng, Hwee Tou. / Improved statistical machine translation for resource-poor languages using related resource-rich languages. EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. 2009. pp. 1358-1367
@inproceedings{82191a9d4ff24f8dbb9f1602de304ca6,
title = "Improved statistical machine translation for resource-poor languages using related resource-rich languages",
abstract = "We propose a novel language-independent approach for improving statistical machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X 1-Y and a larger bi-text for X2-Y for some resource-rich language X2 that is closely related to X1. The evaluation for Indonesian→English (using Malay) and Spanish→English (using Portuguese and pretending Spanish is resource-poor) shows an absolute gain of up to 1.35 and 3.37 Bleu points, respectively, which is an improvement over the rivaling approaches, while using much less additional data.",
author = "Preslav Nakov and Ng, {Hwee Tou}",
year = "2009",
month = "12",
day = "1",
language = "English",
pages = "1358--1367",
booktitle = "EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009",

}

TY - GEN

T1 - Improved statistical machine translation for resource-poor languages using related resource-rich languages

AU - Nakov, Preslav

AU - Ng, Hwee Tou

PY - 2009/12/1

Y1 - 2009/12/1

N2 - We propose a novel language-independent approach for improving statistical machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X 1-Y and a larger bi-text for X2-Y for some resource-rich language X2 that is closely related to X1. The evaluation for Indonesian→English (using Malay) and Spanish→English (using Portuguese and pretending Spanish is resource-poor) shows an absolute gain of up to 1.35 and 3.37 Bleu points, respectively, which is an improvement over the rivaling approaches, while using much less additional data.

AB - We propose a novel language-independent approach for improving statistical machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X 1-Y and a larger bi-text for X2-Y for some resource-rich language X2 that is closely related to X1. The evaluation for Indonesian→English (using Malay) and Spanish→English (using Portuguese and pretending Spanish is resource-poor) shows an absolute gain of up to 1.35 and 3.37 Bleu points, respectively, which is an improvement over the rivaling approaches, while using much less additional data.

UR - http://www.scopus.com/inward/record.url?scp=78650661291&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650661291&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:78650661291

SP - 1358

EP - 1367

BT - EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009

ER -