Simultaneousword-morpheme alignment for statistical machine translation

Elif Eyigöz, Daniel Gildea, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a two-level alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-English parallel data. We obtained significant improvement of BLEU scores over IBM Model 4. Our results indicate that utilizing information from morphology improves the quality of word alignments.

Original languageEnglish
Title of host publicationNAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages32-40
Number of pages9
ISBN (Print)9781937284473
Publication statusPublished - 2013
Event2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013 - Atlanta, United States
Duration: 9 Jun 201314 Jun 2013

Other

Other2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013
CountryUnited States
CityAtlanta
Period9/6/1314/6/13

Fingerprint

Statistical Machine Translation
Morpheme
Alignment
Hidden Markov Model

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Linguistics and Language

Cite this

Eyigöz, E., Gildea, D., & Oflazer, K. (2013). Simultaneousword-morpheme alignment for statistical machine translation. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference (pp. 32-40). Association for Computational Linguistics (ACL).

Simultaneousword-morpheme alignment for statistical machine translation. / Eyigöz, Elif; Gildea, Daniel; Oflazer, Kemal.

NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. p. 32-40.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Eyigöz, E, Gildea, D & Oflazer, K 2013, Simultaneousword-morpheme alignment for statistical machine translation. in NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), pp. 32-40, 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2013, Atlanta, United States, 9/6/13.
Eyigöz E, Gildea D, Oflazer K. Simultaneousword-morpheme alignment for statistical machine translation. In NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL). 2013. p. 32-40
Eyigöz, Elif ; Gildea, Daniel ; Oflazer, Kemal. / Simultaneousword-morpheme alignment for statistical machine translation. NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), 2013. pp. 32-40
@inproceedings{5b2a4fcb82454522892830208d6a0b7e,
title = "Simultaneousword-morpheme alignment for statistical machine translation",
abstract = "Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a two-level alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-English parallel data. We obtained significant improvement of BLEU scores over IBM Model 4. Our results indicate that utilizing information from morphology improves the quality of word alignments.",
author = "Elif Eyig{\"o}z and Daniel Gildea and Kemal Oflazer",
year = "2013",
language = "English",
isbn = "9781937284473",
pages = "32--40",
booktitle = "NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference",
publisher = "Association for Computational Linguistics (ACL)",

}

TY - GEN

T1 - Simultaneousword-morpheme alignment for statistical machine translation

AU - Eyigöz, Elif

AU - Gildea, Daniel

AU - Oflazer, Kemal

PY - 2013

Y1 - 2013

N2 - Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a two-level alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-English parallel data. We obtained significant improvement of BLEU scores over IBM Model 4. Our results indicate that utilizing information from morphology improves the quality of word alignments.

AB - Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a two-level alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-English parallel data. We obtained significant improvement of BLEU scores over IBM Model 4. Our results indicate that utilizing information from morphology improves the quality of word alignments.

UR - http://www.scopus.com/inward/record.url?scp=84926184578&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926184578&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84926184578

SN - 9781937284473

SP - 32

EP - 40

BT - NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference

PB - Association for Computational Linguistics (ACL)

ER -