Modelwith minimal translation units, but decodewith phrases

Nadir Durrani, Alexander Fraser, Helmut Schmid

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

N-gram-based models co-exist with their phrase-based counterparts as an alternative SMT framework. Both techniques have pros and cons. While the N-gram-based framework provides a better model that captures both source and target contexts and avoids spurious phrasal segmentation, the ability to memorize and produce larger translation units gives an edge to the phrase-based systems during decoding, in terms of better search performance and superior selection of translation units. In this paper we combine N-grambased modeling with phrase-based decoding, and obtain the benefits of both approaches. Our experiments show that using this combination not only improves the search accuracy of the N-gram model but that it also improves the BLEU scores. Our system outperforms state-of-The-Art phrase-based systems (Moses and Phrasal) and N-gram-based systems by a significant margin on German, French and Spanish to English translation tasks.

Original languageEnglish
Title of host publicationNAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1-11
Number of pages11
ISBN (Electronic)9781937284473
Publication statusPublished - 2013
Externally publishedYes
Event2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 - Atlanta, United States
Duration: 9 Jun 201314 Jun 2013

Other

Other2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
CountryUnited States
CityAtlanta
Period9/6/1314/6/13

Fingerprint

Decoding
Surface mount technology
Translation Units
N-gram
experiment
ability
Experiments
performance
Modeling
English Translation
Experiment
Segmentation
segmentation

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Linguistics and Language

Cite this

Durrani, N., Fraser, A., & Schmid, H. (2013). Modelwith minimal translation units, but decodewith phrases. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp. 1-11). Association for Computational Linguistics (ACL).

Modelwith minimal translation units, but decodewith phrases. / Durrani, Nadir; Fraser, Alexander; Schmid, Helmut.

NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. p. 1-11.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Durrani, N, Fraser, A & Schmid, H 2013, Modelwith minimal translation units, but decodewith phrases. in NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), pp. 1-11, 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, Atlanta, United States, 9/6/13.
Durrani N, Fraser A, Schmid H. Modelwith minimal translation units, but decodewith phrases. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL). 2013. p. 1-11
Durrani, Nadir ; Fraser, Alexander ; Schmid, Helmut. / Modelwith minimal translation units, but decodewith phrases. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. pp. 1-11
@inproceedings{9fc09d32d3a045bf8409137d3596a55b,
title = "Modelwith minimal translation units, but decodewith phrases",
abstract = "N-gram-based models co-exist with their phrase-based counterparts as an alternative SMT framework. Both techniques have pros and cons. While the N-gram-based framework provides a better model that captures both source and target contexts and avoids spurious phrasal segmentation, the ability to memorize and produce larger translation units gives an edge to the phrase-based systems during decoding, in terms of better search performance and superior selection of translation units. In this paper we combine N-grambased modeling with phrase-based decoding, and obtain the benefits of both approaches. Our experiments show that using this combination not only improves the search accuracy of the N-gram model but that it also improves the BLEU scores. Our system outperforms state-of-The-Art phrase-based systems (Moses and Phrasal) and N-gram-based systems by a significant margin on German, French and Spanish to English translation tasks.",
author = "Nadir Durrani and Alexander Fraser and Helmut Schmid",
year = "2013",
language = "English",
pages = "1--11",
booktitle = "NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Modelwith minimal translation units, but decodewith phrases

AU - Durrani, Nadir

AU - Fraser, Alexander

AU - Schmid, Helmut

PY - 2013

Y1 - 2013

N2 - N-gram-based models co-exist with their phrase-based counterparts as an alternative SMT framework. Both techniques have pros and cons. While the N-gram-based framework provides a better model that captures both source and target contexts and avoids spurious phrasal segmentation, the ability to memorize and produce larger translation units gives an edge to the phrase-based systems during decoding, in terms of better search performance and superior selection of translation units. In this paper we combine N-grambased modeling with phrase-based decoding, and obtain the benefits of both approaches. Our experiments show that using this combination not only improves the search accuracy of the N-gram model but that it also improves the BLEU scores. Our system outperforms state-of-The-Art phrase-based systems (Moses and Phrasal) and N-gram-based systems by a significant margin on German, French and Spanish to English translation tasks.

AB - N-gram-based models co-exist with their phrase-based counterparts as an alternative SMT framework. Both techniques have pros and cons. While the N-gram-based framework provides a better model that captures both source and target contexts and avoids spurious phrasal segmentation, the ability to memorize and produce larger translation units gives an edge to the phrase-based systems during decoding, in terms of better search performance and superior selection of translation units. In this paper we combine N-grambased modeling with phrase-based decoding, and obtain the benefits of both approaches. Our experiments show that using this combination not only improves the search accuracy of the N-gram model but that it also improves the BLEU scores. Our system outperforms state-of-The-Art phrase-based systems (Moses and Phrasal) and N-gram-based systems by a significant margin on German, French and Spanish to English translation tasks.

UR - http://www.scopus.com/inward/record.url?scp=84961305869&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961305869&partnerID=8YFLogxK

M3 - Conference contribution

SP - 1

EP - 11

BT - NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference

PB - Association for Computational Linguistics (ACL)

ER -