Benefits of the 'massively parallel Rosetta Stone'

Cross-language information retrieval with over 30 languages

Peter A. Chew, Ahmed Abdelali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In this paper, we describe our experiences in extending a standard cross-language information retrieval (CLIR) approach which uses parallel aligned corpora and Latent Semantic Indexing. Most, if not all, previous work which follows this approach has focused on bilingual retrieval; two examples involve the use of French- English or English-Greek parallel corpora. Our extension to the approach is 'massively parallel' in two senses, one linguistic and the other computational. First, we make use of a parallel aligned corpus consisting of almost 50 parallel translations in over 30 distinct languages, each in over 30,000 documents. Given the size of this dataset, a 'massively parallel' approach was also necessitated in the more usual computational sense. Our results indicate that, far from adding more noise, more linguistic parallelism is better when it comes to cross-language retrieval precision, in addition to the self-evident benefit that CLIR can be performed on more languages.

Original languageEnglish
Title of host publicationACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics
Pages872-879
Number of pages8
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic
Duration: 23 Jun 200730 Jun 2007

Other

Other45th Annual Meeting of the Association for Computational Linguistics, ACL 2007
CountryCzech Republic
CityPrague
Period23/6/0730/6/07

Fingerprint

information retrieval
language
linguistics
indexing
semantics
Information Retrieval
Language
Cross-language
Parallel Corpora
experience
Computational

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Chew, P. A., & Abdelali, A. (2007). Benefits of the 'massively parallel Rosetta Stone': Cross-language information retrieval with over 30 languages. In ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (pp. 872-879)

Benefits of the 'massively parallel Rosetta Stone' : Cross-language information retrieval with over 30 languages. / Chew, Peter A.; Abdelali, Ahmed.

ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007. p. 872-879.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chew, PA & Abdelali, A 2007, Benefits of the 'massively parallel Rosetta Stone': Cross-language information retrieval with over 30 languages. in ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. pp. 872-879, 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007, Prague, Czech Republic, 23/6/07.
Chew PA, Abdelali A. Benefits of the 'massively parallel Rosetta Stone': Cross-language information retrieval with over 30 languages. In ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007. p. 872-879
Chew, Peter A. ; Abdelali, Ahmed. / Benefits of the 'massively parallel Rosetta Stone' : Cross-language information retrieval with over 30 languages. ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007. pp. 872-879
@inproceedings{214bd7d281854db89a32aeb0173d682a,
title = "Benefits of the 'massively parallel Rosetta Stone': Cross-language information retrieval with over 30 languages",
abstract = "In this paper, we describe our experiences in extending a standard cross-language information retrieval (CLIR) approach which uses parallel aligned corpora and Latent Semantic Indexing. Most, if not all, previous work which follows this approach has focused on bilingual retrieval; two examples involve the use of French- English or English-Greek parallel corpora. Our extension to the approach is 'massively parallel' in two senses, one linguistic and the other computational. First, we make use of a parallel aligned corpus consisting of almost 50 parallel translations in over 30 distinct languages, each in over 30,000 documents. Given the size of this dataset, a 'massively parallel' approach was also necessitated in the more usual computational sense. Our results indicate that, far from adding more noise, more linguistic parallelism is better when it comes to cross-language retrieval precision, in addition to the self-evident benefit that CLIR can be performed on more languages.",
author = "Chew, {Peter A.} and Ahmed Abdelali",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "9781932432862",
pages = "872--879",
booktitle = "ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics",

}

TY - GEN

T1 - Benefits of the 'massively parallel Rosetta Stone'

T2 - Cross-language information retrieval with over 30 languages

AU - Chew, Peter A.

AU - Abdelali, Ahmed

PY - 2007/12/1

Y1 - 2007/12/1

N2 - In this paper, we describe our experiences in extending a standard cross-language information retrieval (CLIR) approach which uses parallel aligned corpora and Latent Semantic Indexing. Most, if not all, previous work which follows this approach has focused on bilingual retrieval; two examples involve the use of French- English or English-Greek parallel corpora. Our extension to the approach is 'massively parallel' in two senses, one linguistic and the other computational. First, we make use of a parallel aligned corpus consisting of almost 50 parallel translations in over 30 distinct languages, each in over 30,000 documents. Given the size of this dataset, a 'massively parallel' approach was also necessitated in the more usual computational sense. Our results indicate that, far from adding more noise, more linguistic parallelism is better when it comes to cross-language retrieval precision, in addition to the self-evident benefit that CLIR can be performed on more languages.

AB - In this paper, we describe our experiences in extending a standard cross-language information retrieval (CLIR) approach which uses parallel aligned corpora and Latent Semantic Indexing. Most, if not all, previous work which follows this approach has focused on bilingual retrieval; two examples involve the use of French- English or English-Greek parallel corpora. Our extension to the approach is 'massively parallel' in two senses, one linguistic and the other computational. First, we make use of a parallel aligned corpus consisting of almost 50 parallel translations in over 30 distinct languages, each in over 30,000 documents. Given the size of this dataset, a 'massively parallel' approach was also necessitated in the more usual computational sense. Our results indicate that, far from adding more noise, more linguistic parallelism is better when it comes to cross-language retrieval precision, in addition to the self-evident benefit that CLIR can be performed on more languages.

UR - http://www.scopus.com/inward/record.url?scp=79957464209&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957464209&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781932432862

SP - 872

EP - 879

BT - ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics

ER -