Aligning Turkish and English parallel texts for statistical machine translation

Ilknur D. El-Kahlout, Kemal Oflazer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment algorithms. Results from a mere 3K sentences of parallel English-Turkish texts show that we are able to link Turkish morphemes with English morphemes and function words quite successfully. We have also used the Turkish WordNet which is linked with the English WordNet, as a bootstrapping dictionary to constrain root word alignments.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages616-625
Number of pages10
Volume3733 LNCS
DOIs
Publication statusPublished - 2005
Externally publishedYes
Event20th International Symposium on Computer and Information Sciences, ISCIS 2005 - Istanbul
Duration: 26 Oct 200528 Oct 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3733 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other20th International Symposium on Computer and Information Sciences, ISCIS 2005
CityIstanbul
Period26/10/0528/10/05

Fingerprint

Statistical Machine Translation
WordNet
Alignment
Glossaries
Bootstrapping
Roots
Text

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

El-Kahlout, I. D., & Oflazer, K. (2005). Aligning Turkish and English parallel texts for statistical machine translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3733 LNCS, pp. 616-625). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3733 LNCS). https://doi.org/10.1007/11569596_64

Aligning Turkish and English parallel texts for statistical machine translation. / El-Kahlout, Ilknur D.; Oflazer, Kemal.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3733 LNCS 2005. p. 616-625 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3733 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

El-Kahlout, ID & Oflazer, K 2005, Aligning Turkish and English parallel texts for statistical machine translation. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3733 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3733 LNCS, pp. 616-625, 20th International Symposium on Computer and Information Sciences, ISCIS 2005, Istanbul, 26/10/05. https://doi.org/10.1007/11569596_64
El-Kahlout ID, Oflazer K. Aligning Turkish and English parallel texts for statistical machine translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3733 LNCS. 2005. p. 616-625. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11569596_64
El-Kahlout, Ilknur D. ; Oflazer, Kemal. / Aligning Turkish and English parallel texts for statistical machine translation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3733 LNCS 2005. pp. 616-625 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{a847224dac5e439c8324a46ebfc6f39b,
title = "Aligning Turkish and English parallel texts for statistical machine translation",
abstract = "This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment algorithms. Results from a mere 3K sentences of parallel English-Turkish texts show that we are able to link Turkish morphemes with English morphemes and function words quite successfully. We have also used the Turkish WordNet which is linked with the English WordNet, as a bootstrapping dictionary to constrain root word alignments.",
author = "El-Kahlout, {Ilknur D.} and Kemal Oflazer",
year = "2005",
doi = "10.1007/11569596_64",
language = "English",
isbn = "3540294147",
volume = "3733 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "616--625",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Aligning Turkish and English parallel texts for statistical machine translation

AU - El-Kahlout, Ilknur D.

AU - Oflazer, Kemal

PY - 2005

Y1 - 2005

N2 - This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment algorithms. Results from a mere 3K sentences of parallel English-Turkish texts show that we are able to link Turkish morphemes with English morphemes and function words quite successfully. We have also used the Turkish WordNet which is linked with the English WordNet, as a bootstrapping dictionary to constrain root word alignments.

AB - This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment algorithms. Results from a mere 3K sentences of parallel English-Turkish texts show that we are able to link Turkish morphemes with English morphemes and function words quite successfully. We have also used the Turkish WordNet which is linked with the English WordNet, as a bootstrapping dictionary to constrain root word alignments.

UR - http://www.scopus.com/inward/record.url?scp=33646507272&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646507272&partnerID=8YFLogxK

U2 - 10.1007/11569596_64

DO - 10.1007/11569596_64

M3 - Conference contribution

SN - 3540294147

SN - 9783540294146

VL - 3733 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 616

EP - 625

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -