Extracting Named Entity Translingual Equivalence with Limited Resources

Fei Huang, Stephan Vogel, Alex Waibel

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

In this article we present an automatic approach to extracting Hindi-English (H-E) Named Entity (NE) translingual equivalences from bilingual parallel corpora. In the absence of a Hindi NE tagger or H-E translation dictionary, this approach adapts a Chinese-English (C-E) surface string transliteration model for H-E NE extraction. The model is initially trained using automatically extracted C-E NE pairs, then iteratively updated based on newly extracted H-E NE pairs. For each English person and location NE in each sentence pair, this approach searches for its Hindi correspondence with minimum transliteration cost and constructs an H-E NE list from the bilingual corpus. Experiments show that this approach extracted 1000 H-E NE pairs with a precision of 91.8%.

Original languageEnglish
Pages (from-to)124-129
Number of pages6
JournalACM Transactions on Asian Language Information Processing
Volume2
Issue number2
DOIs
Publication statusPublished - 1 Jun 2003
Externally publishedYes

Fingerprint

Glossaries
Costs
Experiments

Keywords

  • Algorithms
  • information extraction
  • Language
  • machine translation
  • Named entity translation
  • Performance
  • transliteration

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Extracting Named Entity Translingual Equivalence with Limited Resources. / Huang, Fei; Vogel, Stephan; Waibel, Alex.

In: ACM Transactions on Asian Language Information Processing, Vol. 2, No. 2, 01.06.2003, p. 124-129.

Research output: Contribution to journalArticle

@article{53846d0cf2504475a1082980d49835f2,
title = "Extracting Named Entity Translingual Equivalence with Limited Resources",
abstract = "In this article we present an automatic approach to extracting Hindi-English (H-E) Named Entity (NE) translingual equivalences from bilingual parallel corpora. In the absence of a Hindi NE tagger or H-E translation dictionary, this approach adapts a Chinese-English (C-E) surface string transliteration model for H-E NE extraction. The model is initially trained using automatically extracted C-E NE pairs, then iteratively updated based on newly extracted H-E NE pairs. For each English person and location NE in each sentence pair, this approach searches for its Hindi correspondence with minimum transliteration cost and constructs an H-E NE list from the bilingual corpus. Experiments show that this approach extracted 1000 H-E NE pairs with a precision of 91.8{\%}.",
keywords = "Algorithms, information extraction, Language, machine translation, Named entity translation, Performance, transliteration",
author = "Fei Huang and Stephan Vogel and Alex Waibel",
year = "2003",
month = "6",
day = "1",
doi = "10.1145/974740.974745",
language = "English",
volume = "2",
pages = "124--129",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - Extracting Named Entity Translingual Equivalence with Limited Resources

AU - Huang, Fei

AU - Vogel, Stephan

AU - Waibel, Alex

PY - 2003/6/1

Y1 - 2003/6/1

N2 - In this article we present an automatic approach to extracting Hindi-English (H-E) Named Entity (NE) translingual equivalences from bilingual parallel corpora. In the absence of a Hindi NE tagger or H-E translation dictionary, this approach adapts a Chinese-English (C-E) surface string transliteration model for H-E NE extraction. The model is initially trained using automatically extracted C-E NE pairs, then iteratively updated based on newly extracted H-E NE pairs. For each English person and location NE in each sentence pair, this approach searches for its Hindi correspondence with minimum transliteration cost and constructs an H-E NE list from the bilingual corpus. Experiments show that this approach extracted 1000 H-E NE pairs with a precision of 91.8%.

AB - In this article we present an automatic approach to extracting Hindi-English (H-E) Named Entity (NE) translingual equivalences from bilingual parallel corpora. In the absence of a Hindi NE tagger or H-E translation dictionary, this approach adapts a Chinese-English (C-E) surface string transliteration model for H-E NE extraction. The model is initially trained using automatically extracted C-E NE pairs, then iteratively updated based on newly extracted H-E NE pairs. For each English person and location NE in each sentence pair, this approach searches for its Hindi correspondence with minimum transliteration cost and constructs an H-E NE list from the bilingual corpus. Experiments show that this approach extracted 1000 H-E NE pairs with a precision of 91.8%.

KW - Algorithms

KW - information extraction

KW - Language

KW - machine translation

KW - Named entity translation

KW - Performance

KW - transliteration

UR - http://www.scopus.com/inward/record.url?scp=80053281856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053281856&partnerID=8YFLogxK

U2 - 10.1145/974740.974745

DO - 10.1145/974740.974745

M3 - Article

VL - 2

SP - 124

EP - 129

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 2

ER -