Corpus and evaluation measures for automatic plagiarism detection

Alberto Barron, Martin Potthast, Paolo Rosso, Benno Stein, Andreas Eiselt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts to analyze documents for plagiarism. Unlike other tasks in natural language processing and information retrieval, it is not possible to publish a collection of real plagiarism cases for evaluation purposes since they cannot be properly anonymized. Therefore, current evaluations found in the literature are incomparable and often not even reproducible. Our contribution in this respect is a newly developed large-scale corpus of artificial plagiarism and new detection performance measures tailored to the evaluation of plagiarism detection algorithms.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages771-774
Number of pages4
ISBN (Electronic)2951740867, 9782951740860
Publication statusPublished - 1 Jan 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: 17 May 201023 May 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period17/5/1023/5/10

Fingerprint

evaluation
information retrieval
expert
language
performance
Evaluation
Plagiarism
literature

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Barron, A., Potthast, M., Rosso, P., Stein, B., & Eiselt, A. (2010). Corpus and evaluation measures for automatic plagiarism detection. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 771-774). European Language Resources Association (ELRA).

Corpus and evaluation measures for automatic plagiarism detection. / Barron, Alberto; Potthast, Martin; Rosso, Paolo; Stein, Benno; Eiselt, Andreas.

Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. p. 771-774.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barron, A, Potthast, M, Rosso, P, Stein, B & Eiselt, A 2010, Corpus and evaluation measures for automatic plagiarism detection. in Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), pp. 771-774, 7th International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, 17/5/10.
Barron A, Potthast M, Rosso P, Stein B, Eiselt A. Corpus and evaluation measures for automatic plagiarism detection. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA). 2010. p. 771-774
Barron, Alberto ; Potthast, Martin ; Rosso, Paolo ; Stein, Benno ; Eiselt, Andreas. / Corpus and evaluation measures for automatic plagiarism detection. Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), 2010. pp. 771-774
@inproceedings{db3b7512f0ee4048a38b59aaa8d4d29b,
title = "Corpus and evaluation measures for automatic plagiarism detection",
abstract = "The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts to analyze documents for plagiarism. Unlike other tasks in natural language processing and information retrieval, it is not possible to publish a collection of real plagiarism cases for evaluation purposes since they cannot be properly anonymized. Therefore, current evaluations found in the literature are incomparable and often not even reproducible. Our contribution in this respect is a newly developed large-scale corpus of artificial plagiarism and new detection performance measures tailored to the evaluation of plagiarism detection algorithms.",
author = "Alberto Barron and Martin Potthast and Paolo Rosso and Benno Stein and Andreas Eiselt",
year = "2010",
month = "1",
day = "1",
language = "English",
pages = "771--774",
booktitle = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Corpus and evaluation measures for automatic plagiarism detection

AU - Barron, Alberto

AU - Potthast, Martin

AU - Rosso, Paolo

AU - Stein, Benno

AU - Eiselt, Andreas

PY - 2010/1/1

Y1 - 2010/1/1

N2 - The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts to analyze documents for plagiarism. Unlike other tasks in natural language processing and information retrieval, it is not possible to publish a collection of real plagiarism cases for evaluation purposes since they cannot be properly anonymized. Therefore, current evaluations found in the literature are incomparable and often not even reproducible. Our contribution in this respect is a newly developed large-scale corpus of artificial plagiarism and new detection performance measures tailored to the evaluation of plagiarism detection algorithms.

AB - The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts to analyze documents for plagiarism. Unlike other tasks in natural language processing and information retrieval, it is not possible to publish a collection of real plagiarism cases for evaluation purposes since they cannot be properly anonymized. Therefore, current evaluations found in the literature are incomparable and often not even reproducible. Our contribution in this respect is a newly developed large-scale corpus of artificial plagiarism and new detection performance measures tailored to the evaluation of plagiarism detection algorithms.

UR - http://www.scopus.com/inward/record.url?scp=85006135960&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006135960&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85006135960

SP - 771

EP - 774

BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010

PB - European Language Resources Association (ELRA)

ER -