Combining word-level and character-level models for machine translation between closely-related languages

Preslav Nakov, Jörg Tiedemann

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

We propose several techniques for improving statistical machine translation between closely-related languages with scarce resources. We use character-level translation trained on n-gram-character-aligned bitexts and tuned using word-level BLEU, which we further augment with character-based transliteration at the word level and combine with a word-level translation model. The evaluation on Macedonian-Bulgarian movie subtitles shows an improvement of 2.84 BLEU points over a phrase-based word-level baseline.

Original languageEnglish
Title of host publication50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
Pages301-305
Number of pages5
Volume2
Publication statusPublished - 1 Dec 2012
Event50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Jeju Island, Korea, Republic of
Duration: 8 Jul 201214 Jul 2012

Other

Other50th Annual Meeting of the Association for Computational Linguistics, ACL 2012
CountryKorea, Republic of
CityJeju Island
Period8/7/1214/7/12

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Cite this

Nakov, P., & Tiedemann, J. (2012). Combining word-level and character-level models for machine translation between closely-related languages. In 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference (Vol. 2, pp. 301-305)

Combining word-level and character-level models for machine translation between closely-related languages. / Nakov, Preslav; Tiedemann, Jörg.

50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. Vol. 2 2012. p. 301-305.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakov, P & Tiedemann, J 2012, Combining word-level and character-level models for machine translation between closely-related languages. in 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. vol. 2, pp. 301-305, 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012, Jeju Island, Korea, Republic of, 8/7/12.
Nakov P, Tiedemann J. Combining word-level and character-level models for machine translation between closely-related languages. In 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. Vol. 2. 2012. p. 301-305
Nakov, Preslav ; Tiedemann, Jörg. / Combining word-level and character-level models for machine translation between closely-related languages. 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference. Vol. 2 2012. pp. 301-305
@inproceedings{8668f9759d28492d9b069bdb8d0f53b3,
title = "Combining word-level and character-level models for machine translation between closely-related languages",
abstract = "We propose several techniques for improving statistical machine translation between closely-related languages with scarce resources. We use character-level translation trained on n-gram-character-aligned bitexts and tuned using word-level BLEU, which we further augment with character-based transliteration at the word level and combine with a word-level translation model. The evaluation on Macedonian-Bulgarian movie subtitles shows an improvement of 2.84 BLEU points over a phrase-based word-level baseline.",
author = "Preslav Nakov and J{\"o}rg Tiedemann",
year = "2012",
month = "12",
day = "1",
language = "English",
isbn = "9781937284251",
volume = "2",
pages = "301--305",
booktitle = "50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference",

}

TY - GEN

T1 - Combining word-level and character-level models for machine translation between closely-related languages

AU - Nakov, Preslav

AU - Tiedemann, Jörg

PY - 2012/12/1

Y1 - 2012/12/1

N2 - We propose several techniques for improving statistical machine translation between closely-related languages with scarce resources. We use character-level translation trained on n-gram-character-aligned bitexts and tuned using word-level BLEU, which we further augment with character-based transliteration at the word level and combine with a word-level translation model. The evaluation on Macedonian-Bulgarian movie subtitles shows an improvement of 2.84 BLEU points over a phrase-based word-level baseline.

AB - We propose several techniques for improving statistical machine translation between closely-related languages with scarce resources. We use character-level translation trained on n-gram-character-aligned bitexts and tuned using word-level BLEU, which we further augment with character-based transliteration at the word level and combine with a word-level translation model. The evaluation on Macedonian-Bulgarian movie subtitles shows an improvement of 2.84 BLEU points over a phrase-based word-level baseline.

UR - http://www.scopus.com/inward/record.url?scp=84877731490&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877731490&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781937284251

VL - 2

SP - 301

EP - 305

BT - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference

ER -