A statistical approach to crosslingual natural language tasks

David Pinto, Jorge Civera, Alfons Juan, Paolo Rosso, Alberto Barron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The existence of huge volumes of documents written in multiple languages in Internet lead to investigate novel approaches to deal with information of this kind.We propose to use a statistical approach in order to tackle the problem of dealing with crosslingual natural language tasks. In particular, we apply the IBM alignment model 1 with the aim of obtaining a statistical bilingual dictionary which may further be used in order to approximate the relatedness probability of two given documents (written in different languages). The experimental results sucessfully obtained in three different tasks -text classification, information retrieval and plagiarism analysis- highlight the benefit of using the presented statistical approach.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
Volume408
Publication statusPublished - 2008
Externally publishedYes
Event4th Latin American Workshop on Non-Monotonic Reasoning, LANMR 2008 - Puebla, Mexico
Duration: 22 Oct 200824 Oct 2008

Other

Other4th Latin American Workshop on Non-Monotonic Reasoning, LANMR 2008
CountryMexico
CityPuebla
Period22/10/0824/10/08

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Pinto, D., Civera, J., Juan, A., Rosso, P., & Barron, A. (2008). A statistical approach to crosslingual natural language tasks. In CEUR Workshop Proceedings (Vol. 408)