On cross-lingual plagiarism analysis using a statistical model

Alberto Barron, Paolo Rosso, David Pinto, Alfons Juan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Citations (Scopus)

Abstract

The automatic detection of plagiarism is a task that has acquired relevance in the Information Retrieval area and it becomes more complex when the plagiarism is made in a multilingual panorama, where the original and suspicious texts are written in different languages. From a cross-lingual perspective, a text fragment in one language is considered a plagiarism of a text in another language if their contents are considered semantically similar no matter they are written in different languages and the corresponding citation or credit is not included. Our current experiments on cross-lingual plagiarism analysis are based on the exploitation of a statistical bilingual dictionary. This dictionary is created on the basis of a parallel corpus which contains original fragments written in one language and plagiarised versions of these fragments written in another language. The process for the automatic plagiarism analysis based on the statistical bilingual dictionary has shown good results in the automatic cross-lingual plagiarism analysis and we consider that it could be useful for the cross-lingual near-duplicate detection task.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
Pages9-13
Number of pages5
Volume377
Publication statusPublished - 2008
Externally publishedYes
Event18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 - Patras, Greece
Duration: 22 Jul 200822 Jul 2008

Other

Other18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008
CountryGreece
CityPatras
Period22/7/0822/7/08

Fingerprint

Glossaries
Information retrieval
Statistical Models
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Barron, A., Rosso, P., Pinto, D., & Juan, A. (2008). On cross-lingual plagiarism analysis using a statistical model. In CEUR Workshop Proceedings (Vol. 377, pp. 9-13)

On cross-lingual plagiarism analysis using a statistical model. / Barron, Alberto; Rosso, Paolo; Pinto, David; Juan, Alfons.

CEUR Workshop Proceedings. Vol. 377 2008. p. 9-13.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barron, A, Rosso, P, Pinto, D & Juan, A 2008, On cross-lingual plagiarism analysis using a statistical model. in CEUR Workshop Proceedings. vol. 377, pp. 9-13, 18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008, Patras, Greece, 22/7/08.
Barron A, Rosso P, Pinto D, Juan A. On cross-lingual plagiarism analysis using a statistical model. In CEUR Workshop Proceedings. Vol. 377. 2008. p. 9-13
Barron, Alberto ; Rosso, Paolo ; Pinto, David ; Juan, Alfons. / On cross-lingual plagiarism analysis using a statistical model. CEUR Workshop Proceedings. Vol. 377 2008. pp. 9-13
@inproceedings{65c8889aaada43caa6f2c2db8bfe7c70,
title = "On cross-lingual plagiarism analysis using a statistical model",
abstract = "The automatic detection of plagiarism is a task that has acquired relevance in the Information Retrieval area and it becomes more complex when the plagiarism is made in a multilingual panorama, where the original and suspicious texts are written in different languages. From a cross-lingual perspective, a text fragment in one language is considered a plagiarism of a text in another language if their contents are considered semantically similar no matter they are written in different languages and the corresponding citation or credit is not included. Our current experiments on cross-lingual plagiarism analysis are based on the exploitation of a statistical bilingual dictionary. This dictionary is created on the basis of a parallel corpus which contains original fragments written in one language and plagiarised versions of these fragments written in another language. The process for the automatic plagiarism analysis based on the statistical bilingual dictionary has shown good results in the automatic cross-lingual plagiarism analysis and we consider that it could be useful for the cross-lingual near-duplicate detection task.",
author = "Alberto Barron and Paolo Rosso and David Pinto and Alfons Juan",
year = "2008",
language = "English",
volume = "377",
pages = "9--13",
booktitle = "CEUR Workshop Proceedings",

}

TY - GEN

T1 - On cross-lingual plagiarism analysis using a statistical model

AU - Barron, Alberto

AU - Rosso, Paolo

AU - Pinto, David

AU - Juan, Alfons

PY - 2008

Y1 - 2008

N2 - The automatic detection of plagiarism is a task that has acquired relevance in the Information Retrieval area and it becomes more complex when the plagiarism is made in a multilingual panorama, where the original and suspicious texts are written in different languages. From a cross-lingual perspective, a text fragment in one language is considered a plagiarism of a text in another language if their contents are considered semantically similar no matter they are written in different languages and the corresponding citation or credit is not included. Our current experiments on cross-lingual plagiarism analysis are based on the exploitation of a statistical bilingual dictionary. This dictionary is created on the basis of a parallel corpus which contains original fragments written in one language and plagiarised versions of these fragments written in another language. The process for the automatic plagiarism analysis based on the statistical bilingual dictionary has shown good results in the automatic cross-lingual plagiarism analysis and we consider that it could be useful for the cross-lingual near-duplicate detection task.

AB - The automatic detection of plagiarism is a task that has acquired relevance in the Information Retrieval area and it becomes more complex when the plagiarism is made in a multilingual panorama, where the original and suspicious texts are written in different languages. From a cross-lingual perspective, a text fragment in one language is considered a plagiarism of a text in another language if their contents are considered semantically similar no matter they are written in different languages and the corresponding citation or credit is not included. Our current experiments on cross-lingual plagiarism analysis are based on the exploitation of a statistical bilingual dictionary. This dictionary is created on the basis of a parallel corpus which contains original fragments written in one language and plagiarised versions of these fragments written in another language. The process for the automatic plagiarism analysis based on the statistical bilingual dictionary has shown good results in the automatic cross-lingual plagiarism analysis and we consider that it could be useful for the cross-lingual near-duplicate detection task.

UR - http://www.scopus.com/inward/record.url?scp=84885224337&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885224337&partnerID=8YFLogxK

M3 - Conference contribution

VL - 377

SP - 9

EP - 13

BT - CEUR Workshop Proceedings

ER -