Corpus and evaluation measures for automatic plagiarism detection

Alberto Barron, Martin Potthast, Paolo Rosso, Benno Stein, Andreas Eiselt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human experts to analyze documents for plagiarism. Unlike other tasks in natural language processing and information retrieval, it is not possible to publish a collection of real plagiarism cases for evaluation purposes since they cannot be properly anonymized. Therefore, current evaluations found in the literature are incomparable and often not even reproducible. Our contribution in this respect is a newly developed large-scale corpus of artificial plagiarism and new detection performance measures tailored to the evaluation of plagiarism detection algorithms.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
PublisherEuropean Language Resources Association (ELRA)
Pages771-774
Number of pages4
ISBN (Electronic)2951740867, 9782951740860
Publication statusPublished - 1 Jan 2010
Event7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta
Duration: 17 May 201023 May 2010

Other

Other7th International Conference on Language Resources and Evaluation, LREC 2010
CountryMalta
CityValletta
Period17/5/1023/5/10

    Fingerprint

ASJC Scopus subject areas

  • Education
  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics

Cite this

Barron, A., Potthast, M., Rosso, P., Stein, B., & Eiselt, A. (2010). Corpus and evaluation measures for automatic plagiarism detection. In Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 (pp. 771-774). European Language Resources Association (ELRA).