An algorithm for unsupervised transliteration mining with an application to word alignment

Hassan Sajjad, Alexander Fraser, Helmut Schmid

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora. In contrast to previous work, our method uses no form of supervision, and does not require linguistically informed preprocessing. We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, outperforming most of the semi-supervised systems that were submitted. We also apply our method to English/Hindi and English/Arabic parallel corpora and compare the results with manually built gold standards which mark transliterated word pairs. Finally, we integrate the transliteration module into the GIZA++ word aligner and evaluate it on two word alignment tasks achieving improvements in both precision and recall measured against gold standard word alignments.

Original languageEnglish
Title of host publicationACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Pages430-439
Number of pages10
Volume1
Publication statusPublished - 1 Dec 2011
Externally publishedYes
Event49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Portland, OR, United States
Duration: 19 Jun 201124 Jun 2011

Other

Other49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
CountryUnited States
CityPortland, OR
Period19/6/1124/6/11

Fingerprint

gold standard
supervision
experiment
language
Alignment
Transliteration
Gold Standard
Parallel Corpora

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Sajjad, H., Fraser, A., & Schmid, H. (2011). An algorithm for unsupervised transliteration mining with an application to word alignment. In ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Vol. 1, pp. 430-439)

An algorithm for unsupervised transliteration mining with an application to word alignment. / Sajjad, Hassan; Fraser, Alexander; Schmid, Helmut.

ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 2011. p. 430-439.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sajjad, H, Fraser, A & Schmid, H 2011, An algorithm for unsupervised transliteration mining with an application to word alignment. in ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. vol. 1, pp. 430-439, 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011, Portland, OR, United States, 19/6/11.
Sajjad H, Fraser A, Schmid H. An algorithm for unsupervised transliteration mining with an application to word alignment. In ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 1. 2011. p. 430-439
Sajjad, Hassan ; Fraser, Alexander ; Schmid, Helmut. / An algorithm for unsupervised transliteration mining with an application to word alignment. ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 2011. pp. 430-439
@inproceedings{a2d75d2cccb94b90916d7323a22abc3a,
title = "An algorithm for unsupervised transliteration mining with an application to word alignment",
abstract = "We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora. In contrast to previous work, our method uses no form of supervision, and does not require linguistically informed preprocessing. We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92{\%}, outperforming most of the semi-supervised systems that were submitted. We also apply our method to English/Hindi and English/Arabic parallel corpora and compare the results with manually built gold standards which mark transliterated word pairs. Finally, we integrate the transliteration module into the GIZA++ word aligner and evaluate it on two word alignment tasks achieving improvements in both precision and recall measured against gold standard word alignments.",
author = "Hassan Sajjad and Alexander Fraser and Helmut Schmid",
year = "2011",
month = "12",
day = "1",
language = "English",
isbn = "9781932432879",
volume = "1",
pages = "430--439",
booktitle = "ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies",

}

TY - GEN

T1 - An algorithm for unsupervised transliteration mining with an application to word alignment

AU - Sajjad, Hassan

AU - Fraser, Alexander

AU - Schmid, Helmut

PY - 2011/12/1

Y1 - 2011/12/1

N2 - We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora. In contrast to previous work, our method uses no form of supervision, and does not require linguistically informed preprocessing. We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, outperforming most of the semi-supervised systems that were submitted. We also apply our method to English/Hindi and English/Arabic parallel corpora and compare the results with manually built gold standards which mark transliterated word pairs. Finally, we integrate the transliteration module into the GIZA++ word aligner and evaluate it on two word alignment tasks achieving improvements in both precision and recall measured against gold standard word alignments.

AB - We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora. In contrast to previous work, our method uses no form of supervision, and does not require linguistically informed preprocessing. We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, outperforming most of the semi-supervised systems that were submitted. We also apply our method to English/Hindi and English/Arabic parallel corpora and compare the results with manually built gold standards which mark transliterated word pairs. Finally, we integrate the transliteration module into the GIZA++ word aligner and evaluate it on two word alignment tasks achieving improvements in both precision and recall measured against gold standard word alignments.

UR - http://www.scopus.com/inward/record.url?scp=84859088243&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859088243&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84859088243

SN - 9781932432879

VL - 1

SP - 430

EP - 439

BT - ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

ER -