Clustering and classifying person names by origin

Fei Huang, Stephan Vogel, Alex Waibel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

In natural language processing, information about a person's geographical origin is an important feature for name entity transliteration and question answering. We propose a language-independent name origin clustering and classification framework. Provided with a small amount of bilingual name translation pairs with labeled origins, we measure origin similarities based on the perplexities of name character language and translation models. We group similar origins into clusters, then train a Bayesian classifier with different features. It achieves 84% classification accuracy with source names only, and 91% with both source and target name pairs. We apply the origin clustering and classification technique to a name transliteration task. The cluster-specific transliteration model dramatically improves the transliteration accuracy from 3.8% to 55%, reducing the transliteration character error rate from 50.3 to 13.5. Adding more unlabeled name pairs to the cluster-specific name transliteration model further improves the transliteration accuracy.

Original languageEnglish
Title of host publicationProceedings of the National Conference on Artificial Intelligence
Pages1056-1061
Number of pages6
Volume3
Publication statusPublished - 1 Dec 2005
Externally publishedYes
Event20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05 - Pittsburgh, PA, United States
Duration: 9 Jul 200513 Jul 2005

Other

Other20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
CountryUnited States
CityPittsburgh, PA
Period9/7/0513/7/05

Fingerprint

Classifiers

ASJC Scopus subject areas

  • Software

Cite this

Huang, F., Vogel, S., & Waibel, A. (2005). Clustering and classifying person names by origin. In Proceedings of the National Conference on Artificial Intelligence (Vol. 3, pp. 1056-1061)

Clustering and classifying person names by origin. / Huang, Fei; Vogel, Stephan; Waibel, Alex.

Proceedings of the National Conference on Artificial Intelligence. Vol. 3 2005. p. 1056-1061.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Huang, F, Vogel, S & Waibel, A 2005, Clustering and classifying person names by origin. in Proceedings of the National Conference on Artificial Intelligence. vol. 3, pp. 1056-1061, 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05, Pittsburgh, PA, United States, 9/7/05.
Huang F, Vogel S, Waibel A. Clustering and classifying person names by origin. In Proceedings of the National Conference on Artificial Intelligence. Vol. 3. 2005. p. 1056-1061
Huang, Fei ; Vogel, Stephan ; Waibel, Alex. / Clustering and classifying person names by origin. Proceedings of the National Conference on Artificial Intelligence. Vol. 3 2005. pp. 1056-1061
@inproceedings{1837960a27384b31b2b0cc6a393871d0,
title = "Clustering and classifying person names by origin",
abstract = "In natural language processing, information about a person's geographical origin is an important feature for name entity transliteration and question answering. We propose a language-independent name origin clustering and classification framework. Provided with a small amount of bilingual name translation pairs with labeled origins, we measure origin similarities based on the perplexities of name character language and translation models. We group similar origins into clusters, then train a Bayesian classifier with different features. It achieves 84{\%} classification accuracy with source names only, and 91{\%} with both source and target name pairs. We apply the origin clustering and classification technique to a name transliteration task. The cluster-specific transliteration model dramatically improves the transliteration accuracy from 3.8{\%} to 55{\%}, reducing the transliteration character error rate from 50.3 to 13.5. Adding more unlabeled name pairs to the cluster-specific name transliteration model further improves the transliteration accuracy.",
author = "Fei Huang and Stephan Vogel and Alex Waibel",
year = "2005",
month = "12",
day = "1",
language = "English",
volume = "3",
pages = "1056--1061",
booktitle = "Proceedings of the National Conference on Artificial Intelligence",

}

TY - GEN

T1 - Clustering and classifying person names by origin

AU - Huang, Fei

AU - Vogel, Stephan

AU - Waibel, Alex

PY - 2005/12/1

Y1 - 2005/12/1

N2 - In natural language processing, information about a person's geographical origin is an important feature for name entity transliteration and question answering. We propose a language-independent name origin clustering and classification framework. Provided with a small amount of bilingual name translation pairs with labeled origins, we measure origin similarities based on the perplexities of name character language and translation models. We group similar origins into clusters, then train a Bayesian classifier with different features. It achieves 84% classification accuracy with source names only, and 91% with both source and target name pairs. We apply the origin clustering and classification technique to a name transliteration task. The cluster-specific transliteration model dramatically improves the transliteration accuracy from 3.8% to 55%, reducing the transliteration character error rate from 50.3 to 13.5. Adding more unlabeled name pairs to the cluster-specific name transliteration model further improves the transliteration accuracy.

AB - In natural language processing, information about a person's geographical origin is an important feature for name entity transliteration and question answering. We propose a language-independent name origin clustering and classification framework. Provided with a small amount of bilingual name translation pairs with labeled origins, we measure origin similarities based on the perplexities of name character language and translation models. We group similar origins into clusters, then train a Bayesian classifier with different features. It achieves 84% classification accuracy with source names only, and 91% with both source and target name pairs. We apply the origin clustering and classification technique to a name transliteration task. The cluster-specific transliteration model dramatically improves the transliteration accuracy from 3.8% to 55%, reducing the transliteration character error rate from 50.3 to 13.5. Adding more unlabeled name pairs to the cluster-specific name transliteration model further improves the transliteration accuracy.

UR - http://www.scopus.com/inward/record.url?scp=29344468503&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=29344468503&partnerID=8YFLogxK

M3 - Conference contribution

VL - 3

SP - 1056

EP - 1061

BT - Proceedings of the National Conference on Artificial Intelligence

ER -