An evaluation framework for plagiarism detection

Martin Potthast, Benno Stein, Alberto Barron, Paolo Rosso

Research output: Chapter in Book/Report/Conference proceedingConference contribution

214 Citations (Scopus)

Abstract

We present an evaluation framework for plagiarism detection.1 The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

Original languageEnglish
Title of host publicationColing 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference
Pages997-1005
Number of pages9
Volume2
Publication statusPublished - 2010
Externally publishedYes
Event23rd International Conference on Computational Linguistics, Coling 2010 - Beijing, China
Duration: 23 Aug 201027 Aug 2010

Other

Other23rd International Conference on Computational Linguistics, Coling 2010
CountryChina
CityBeijing
Period23/8/1027/8/10

Fingerprint

Turk
evaluation
PC
performance
evidence
Evaluation
Plagiarism
Amazon
Empirical Evidence
Artificial
Turks
Performance Measures

ASJC Scopus subject areas

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Linguistics and Language

Cite this

Potthast, M., Stein, B., Barron, A., & Rosso, P. (2010). An evaluation framework for plagiarism detection. In Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference (Vol. 2, pp. 997-1005)

An evaluation framework for plagiarism detection. / Potthast, Martin; Stein, Benno; Barron, Alberto; Rosso, Paolo.

Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. Vol. 2 2010. p. 997-1005.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Potthast, M, Stein, B, Barron, A & Rosso, P 2010, An evaluation framework for plagiarism detection. in Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. vol. 2, pp. 997-1005, 23rd International Conference on Computational Linguistics, Coling 2010, Beijing, China, 23/8/10.
Potthast M, Stein B, Barron A, Rosso P. An evaluation framework for plagiarism detection. In Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. Vol. 2. 2010. p. 997-1005
Potthast, Martin ; Stein, Benno ; Barron, Alberto ; Rosso, Paolo. / An evaluation framework for plagiarism detection. Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference. Vol. 2 2010. pp. 997-1005
@inproceedings{11c2a718501149e48e8f1a8c888e8734,
title = "An evaluation framework for plagiarism detection",
abstract = "We present an evaluation framework for plagiarism detection.1 The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.",
author = "Martin Potthast and Benno Stein and Alberto Barron and Paolo Rosso",
year = "2010",
language = "English",
volume = "2",
pages = "997--1005",
booktitle = "Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference",

}

TY - GEN

T1 - An evaluation framework for plagiarism detection

AU - Potthast, Martin

AU - Stein, Benno

AU - Barron, Alberto

AU - Rosso, Paolo

PY - 2010

Y1 - 2010

N2 - We present an evaluation framework for plagiarism detection.1 The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

AB - We present an evaluation framework for plagiarism detection.1 The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

UR - http://www.scopus.com/inward/record.url?scp=80053426905&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053426905&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053426905

VL - 2

SP - 997

EP - 1005

BT - Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference

ER -