A statistical approach to crosslingual natural language tasks

David Pinto, Jorge Civera, Alfons Juan, Paolo Rosso, Alberto Barron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The existence of huge volumes of documents written in multiple languages in Internet lead to investigate novel approaches to deal with information of this kind.We propose to use a statistical approach in order to tackle the problem of dealing with crosslingual natural language tasks. In particular, we apply the IBM alignment model 1 with the aim of obtaining a statistical bilingual dictionary which may further be used in order to approximate the relatedness probability of two given documents (written in different languages). The experimental results sucessfully obtained in three different tasks -text classification, information retrieval and plagiarism analysis- highlight the benefit of using the presented statistical approach.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
Volume408
Publication statusPublished - 2008
Externally publishedYes
Event4th Latin American Workshop on Non-Monotonic Reasoning, LANMR 2008 - Puebla, Mexico
Duration: 22 Oct 200824 Oct 2008

Other

Other4th Latin American Workshop on Non-Monotonic Reasoning, LANMR 2008
CountryMexico
CityPuebla
Period22/10/0824/10/08

Fingerprint

Glossaries
Information retrieval
Internet

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Pinto, D., Civera, J., Juan, A., Rosso, P., & Barron, A. (2008). A statistical approach to crosslingual natural language tasks. In CEUR Workshop Proceedings (Vol. 408)

A statistical approach to crosslingual natural language tasks. / Pinto, David; Civera, Jorge; Juan, Alfons; Rosso, Paolo; Barron, Alberto.

CEUR Workshop Proceedings. Vol. 408 2008.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pinto, D, Civera, J, Juan, A, Rosso, P & Barron, A 2008, A statistical approach to crosslingual natural language tasks. in CEUR Workshop Proceedings. vol. 408, 4th Latin American Workshop on Non-Monotonic Reasoning, LANMR 2008, Puebla, Mexico, 22/10/08.
Pinto D, Civera J, Juan A, Rosso P, Barron A. A statistical approach to crosslingual natural language tasks. In CEUR Workshop Proceedings. Vol. 408. 2008
Pinto, David ; Civera, Jorge ; Juan, Alfons ; Rosso, Paolo ; Barron, Alberto. / A statistical approach to crosslingual natural language tasks. CEUR Workshop Proceedings. Vol. 408 2008.
@inproceedings{69ceea604cdb47eab2bd51c2cf8e1fbd,
title = "A statistical approach to crosslingual natural language tasks",
abstract = "The existence of huge volumes of documents written in multiple languages in Internet lead to investigate novel approaches to deal with information of this kind.We propose to use a statistical approach in order to tackle the problem of dealing with crosslingual natural language tasks. In particular, we apply the IBM alignment model 1 with the aim of obtaining a statistical bilingual dictionary which may further be used in order to approximate the relatedness probability of two given documents (written in different languages). The experimental results sucessfully obtained in three different tasks -text classification, information retrieval and plagiarism analysis- highlight the benefit of using the presented statistical approach.",
author = "David Pinto and Jorge Civera and Alfons Juan and Paolo Rosso and Alberto Barron",
year = "2008",
language = "English",
volume = "408",
booktitle = "CEUR Workshop Proceedings",

}

TY - GEN

T1 - A statistical approach to crosslingual natural language tasks

AU - Pinto, David

AU - Civera, Jorge

AU - Juan, Alfons

AU - Rosso, Paolo

AU - Barron, Alberto

PY - 2008

Y1 - 2008

N2 - The existence of huge volumes of documents written in multiple languages in Internet lead to investigate novel approaches to deal with information of this kind.We propose to use a statistical approach in order to tackle the problem of dealing with crosslingual natural language tasks. In particular, we apply the IBM alignment model 1 with the aim of obtaining a statistical bilingual dictionary which may further be used in order to approximate the relatedness probability of two given documents (written in different languages). The experimental results sucessfully obtained in three different tasks -text classification, information retrieval and plagiarism analysis- highlight the benefit of using the presented statistical approach.

AB - The existence of huge volumes of documents written in multiple languages in Internet lead to investigate novel approaches to deal with information of this kind.We propose to use a statistical approach in order to tackle the problem of dealing with crosslingual natural language tasks. In particular, we apply the IBM alignment model 1 with the aim of obtaining a statistical bilingual dictionary which may further be used in order to approximate the relatedness probability of two given documents (written in different languages). The experimental results sucessfully obtained in three different tasks -text classification, information retrieval and plagiarism analysis- highlight the benefit of using the presented statistical approach.

UR - http://www.scopus.com/inward/record.url?scp=84871587973&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871587973&partnerID=8YFLogxK

M3 - Conference contribution

VL - 408

BT - CEUR Workshop Proceedings

ER -