A log-linear block transliteration model based on bi-stream HMMs

Bing Zhao, Nguyen Bach, Ian Lane, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

We propose a novel HMM-based framework to accurately transliterate unseen named entities. The framework leverages features in letter-alignment and letter n-gram pairs learned from available bilingual dictionaries. Letter-classes, such as vowels/non-vowels, are integrated to further improve transliteration accuracy. The proposed transliteration system is applied to out-of-vocabulary named-entities in statistical machine translation (SMT), and a significant improvement over traditional transliteration approach is obtained. Furthermore, by incorporating an automatic spell-checker based on statistics collected from web search engines, transliteration accuracy is further improved. The proposed system is implemented within our SMT system and applied to a real translation scenario from Arabic to English.

Original languageEnglish
Title of host publicationNAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
Pages364-371
Number of pages8
Publication statusPublished - 1 Dec 2007
Externally publishedYes
EventHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007 - Rochester, NY, United States
Duration: 22 Apr 200727 Apr 2007

Other

OtherHuman Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007
CountryUnited States
CityRochester, NY
Period22/4/0727/4/07

Fingerprint

dictionary
search engine
vocabulary
statistics
scenario
Hidden Markov Model
Transliteration
Letters
Entity
Statistical Machine Translation
Search Engine
Spell
Statistics
N-gram
Alignment
Machine Translation System
Bilingual Dictionary
World Wide Web
Scenarios
Vocabulary

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Zhao, B., Bach, N., Lane, I., & Vogel, S. (2007). A log-linear block transliteration model based on bi-stream HMMs. In NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference (pp. 364-371)

A log-linear block transliteration model based on bi-stream HMMs. / Zhao, Bing; Bach, Nguyen; Lane, Ian; Vogel, Stephan.

NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. 2007. p. 364-371.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhao, B, Bach, N, Lane, I & Vogel, S 2007, A log-linear block transliteration model based on bi-stream HMMs. in NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. pp. 364-371, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, NAACL HLT 2007, Rochester, NY, United States, 22/4/07.
Zhao B, Bach N, Lane I, Vogel S. A log-linear block transliteration model based on bi-stream HMMs. In NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. 2007. p. 364-371
Zhao, Bing ; Bach, Nguyen ; Lane, Ian ; Vogel, Stephan. / A log-linear block transliteration model based on bi-stream HMMs. NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference. 2007. pp. 364-371
@inproceedings{8e76d1a177ad460bbb47e1affc648a53,
title = "A log-linear block transliteration model based on bi-stream HMMs",
abstract = "We propose a novel HMM-based framework to accurately transliterate unseen named entities. The framework leverages features in letter-alignment and letter n-gram pairs learned from available bilingual dictionaries. Letter-classes, such as vowels/non-vowels, are integrated to further improve transliteration accuracy. The proposed transliteration system is applied to out-of-vocabulary named-entities in statistical machine translation (SMT), and a significant improvement over traditional transliteration approach is obtained. Furthermore, by incorporating an automatic spell-checker based on statistics collected from web search engines, transliteration accuracy is further improved. The proposed system is implemented within our SMT system and applied to a real translation scenario from Arabic to English.",
author = "Bing Zhao and Nguyen Bach and Ian Lane and Stephan Vogel",
year = "2007",
month = "12",
day = "1",
language = "English",
pages = "364--371",
booktitle = "NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference",

}

TY - GEN

T1 - A log-linear block transliteration model based on bi-stream HMMs

AU - Zhao, Bing

AU - Bach, Nguyen

AU - Lane, Ian

AU - Vogel, Stephan

PY - 2007/12/1

Y1 - 2007/12/1

N2 - We propose a novel HMM-based framework to accurately transliterate unseen named entities. The framework leverages features in letter-alignment and letter n-gram pairs learned from available bilingual dictionaries. Letter-classes, such as vowels/non-vowels, are integrated to further improve transliteration accuracy. The proposed transliteration system is applied to out-of-vocabulary named-entities in statistical machine translation (SMT), and a significant improvement over traditional transliteration approach is obtained. Furthermore, by incorporating an automatic spell-checker based on statistics collected from web search engines, transliteration accuracy is further improved. The proposed system is implemented within our SMT system and applied to a real translation scenario from Arabic to English.

AB - We propose a novel HMM-based framework to accurately transliterate unseen named entities. The framework leverages features in letter-alignment and letter n-gram pairs learned from available bilingual dictionaries. Letter-classes, such as vowels/non-vowels, are integrated to further improve transliteration accuracy. The proposed transliteration system is applied to out-of-vocabulary named-entities in statistical machine translation (SMT), and a significant improvement over traditional transliteration approach is obtained. Furthermore, by incorporating an automatic spell-checker based on statistics collected from web search engines, transliteration accuracy is further improved. The proposed system is implemented within our SMT system and applied to a real translation scenario from Arabic to English.

UR - http://www.scopus.com/inward/record.url?scp=78649303007&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78649303007&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:78649303007

SP - 364

EP - 371

BT - NAACL HLT 2007 - Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference

ER -