A statistical model for unsupervised and semi-supervised transliteration mining

Hassan Sajjad, Alexander Fraser, Helmut Schmid

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

We propose a novel model to automatically extract transliteration pairs from parallel corpora. Our model is efficient, language pair independent and mines transliteration pairs in a consistent fashion in both unsupervised and semi-supervised settings. We model transliteration mining as an interpolation of transliteration and non-transliteration sub-models. We evaluate on NEWS 2010 shared task data and on parallel corpora with competitive results.

Original languageEnglish
Title of host publication50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
Pages469-477
Number of pages9
Volume1
Publication statusPublished - 1 Dec 2012
Externally publishedYes
Event50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Jeju Island, Korea, Republic of
Duration: 8 Jul 201214 Jul 2012

Other

Other50th Annual Meeting of the Association for Computational Linguistics, ACL 2012
CountryKorea, Republic of
CityJeju Island
Period8/7/1214/7/12

    Fingerprint

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Cite this

Sajjad, H., Fraser, A., & Schmid, H. (2012). A statistical model for unsupervised and semi-supervised transliteration mining. In 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference (Vol. 1, pp. 469-477)