Mining key phrase translations from web corpora

Fei Huang, Ying Zhang, Stephan Vogel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

47 Citations (Scopus)

Abstract

Key phrases are usually among the most information-bearing linguistic structures. Translating them correctly will improve many natural language processing applications. We propose a new framework to mine key phrase translations from web corpora. We submit a source phrase to a search engine as a query, then expand queries by adding the translations of topic-relevant hint words from the returned snippets. We retrieve mixedlanguage web pages based on the expanded queries. Finally, we extract the key phrase translation from the secondround returned web page snippets with phonetic, semantic and frequencydistance features. We achieve 46% phrase translation accuracy when using top 10 returned snippets, and 80% accuracy with 165 snippets. Both results are significantly better than several existing methods.

Original languageEnglish
Title of host publicationHLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages483-490
Number of pages8
Publication statusPublished - 1 Dec 2005
Externally publishedYes
EventHuman Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT - Vancouver, BC, Canada
Duration: 6 Oct 20058 Oct 2005

Other

OtherHuman Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT
CountryCanada
CityVancouver, BC
Period6/10/058/10/05

Fingerprint

Websites
Bearings (structural)
Speech analysis
Search engines
Linguistics
Semantics
Processing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Huang, F., Zhang, Y., & Vogel, S. (2005). Mining key phrase translations from web corpora. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 483-490)

Mining key phrase translations from web corpora. / Huang, Fei; Zhang, Ying; Vogel, Stephan.

HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2005. p. 483-490.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Huang, F, Zhang, Y & Vogel, S 2005, Mining key phrase translations from web corpora. in HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 483-490, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT, Vancouver, BC, Canada, 6/10/05.
Huang F, Zhang Y, Vogel S. Mining key phrase translations from web corpora. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2005. p. 483-490
Huang, Fei ; Zhang, Ying ; Vogel, Stephan. / Mining key phrase translations from web corpora. HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2005. pp. 483-490
@inproceedings{71fef114fc464f3782978ebcb4b8fd2f,
title = "Mining key phrase translations from web corpora",
abstract = "Key phrases are usually among the most information-bearing linguistic structures. Translating them correctly will improve many natural language processing applications. We propose a new framework to mine key phrase translations from web corpora. We submit a source phrase to a search engine as a query, then expand queries by adding the translations of topic-relevant hint words from the returned snippets. We retrieve mixedlanguage web pages based on the expanded queries. Finally, we extract the key phrase translation from the secondround returned web page snippets with phonetic, semantic and frequencydistance features. We achieve 46{\%} phrase translation accuracy when using top 10 returned snippets, and 80{\%} accuracy with 165 snippets. Both results are significantly better than several existing methods.",
author = "Fei Huang and Ying Zhang and Stephan Vogel",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "483--490",
booktitle = "HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

TY - GEN

T1 - Mining key phrase translations from web corpora

AU - Huang, Fei

AU - Zhang, Ying

AU - Vogel, Stephan

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Key phrases are usually among the most information-bearing linguistic structures. Translating them correctly will improve many natural language processing applications. We propose a new framework to mine key phrase translations from web corpora. We submit a source phrase to a search engine as a query, then expand queries by adding the translations of topic-relevant hint words from the returned snippets. We retrieve mixedlanguage web pages based on the expanded queries. Finally, we extract the key phrase translation from the secondround returned web page snippets with phonetic, semantic and frequencydistance features. We achieve 46% phrase translation accuracy when using top 10 returned snippets, and 80% accuracy with 165 snippets. Both results are significantly better than several existing methods.

AB - Key phrases are usually among the most information-bearing linguistic structures. Translating them correctly will improve many natural language processing applications. We propose a new framework to mine key phrase translations from web corpora. We submit a source phrase to a search engine as a query, then expand queries by adding the translations of topic-relevant hint words from the returned snippets. We retrieve mixedlanguage web pages based on the expanded queries. Finally, we extract the key phrase translation from the secondround returned web page snippets with phonetic, semantic and frequencydistance features. We achieve 46% phrase translation accuracy when using top 10 returned snippets, and 80% accuracy with 165 snippets. Both results are significantly better than several existing methods.

UR - http://www.scopus.com/inward/record.url?scp=80053267981&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053267981&partnerID=8YFLogxK

M3 - Conference contribution

SP - 483

EP - 490

BT - HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ER -