Benefits of the 'massively parallel Rosetta Stone': Cross-language information retrieval with over 30 languages

Peter A. Chew, Ahmed Abdelali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In this paper, we describe our experiences in extending a standard cross-language information retrieval (CLIR) approach which uses parallel aligned corpora and Latent Semantic Indexing. Most, if not all, previous work which follows this approach has focused on bilingual retrieval; two examples involve the use of French- English or English-Greek parallel corpora. Our extension to the approach is 'massively parallel' in two senses, one linguistic and the other computational. First, we make use of a parallel aligned corpus consisting of almost 50 parallel translations in over 30 distinct languages, each in over 30,000 documents. Given the size of this dataset, a 'massively parallel' approach was also necessitated in the more usual computational sense. Our results indicate that, far from adding more noise, more linguistic parallelism is better when it comes to cross-language retrieval precision, in addition to the self-evident benefit that CLIR can be performed on more languages.

Original languageEnglish
Title of host publicationACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
Pages872-879
Number of pages8
Publication statusPublished - 1 Dec 2007
Event45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic
Duration: 23 Jun 200730 Jun 2007

Publication series

NameACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics

Other

Other45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
CountryCzech Republic
CityPrague
Period23/6/0730/6/07

    Fingerprint

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Chew, P. A., & Abdelali, A. (2007). Benefits of the 'massively parallel Rosetta Stone': Cross-language information retrieval with over 30 languages. In ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (pp. 872-879). (ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics).