A high-performance plagiarism detection system Notebook for PAN at CLEF 2011

Neil Cooke, Lee Gillam, Peter Wrobel, Henry Cooke, Fahad Khalid Al Obaidli

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

In this paper we report on our high-performance plagiarism detection system which is able to process the PAN plagiarism corpus for the external plagiarism detection task within relatively short timescales in contrast to previously reported state-of-the-art, and still produce a reasonable degree of performance (PAN 11, 4th place, PlagDet=0.2467329, Recall=0.1500480, Precision=0.7106536, Granularity=1.0058894). At the core of our system is a simple method which avoids the use of hash-type approaches, but about which we are unable to disclose too many details due to a patent application in progress. We optimised our performance using the PAN10 collection, and used the best parameters for the final submission. We anticipated a relatively similar performance at PAN11, modulo changes to the plagiarism cases, and 4th place this year put us between participants who had been 5th and 6th in PAN 10.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume1177
Publication statusPublished - 1 Jan 2011
Externally publishedYes

ASJC Scopus subject areas

  • Computer Science(all)

Cite this