A high-performance plagiarism detection system Notebook for PAN at CLEF 2011

Neil Cooke, Lee Gillam, Peter Wrobel, Henry Cooke, Fahad Khalid Al Obaidli

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

In this paper we report on our high-performance plagiarism detection system which is able to process the PAN plagiarism corpus for the external plagiarism detection task within relatively short timescales in contrast to previously reported state-of-the-art, and still produce a reasonable degree of performance (PAN 11, 4th place, PlagDet=0.2467329, Recall=0.1500480, Precision=0.7106536, Granularity=1.0058894). At the core of our system is a simple method which avoids the use of hash-type approaches, but about which we are unable to disclose too many details due to a patent application in progress. We optimised our performance using the PAN10 collection, and used the best parameters for the final submission. We anticipated a relatively similar performance at PAN11, modulo changes to the plagiarism cases, and 4th place this year put us between participants who had been 5th and 6th in PAN 10.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume1177
Publication statusPublished - 1 Jan 2011
Externally publishedYes

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

A high-performance plagiarism detection system Notebook for PAN at CLEF 2011. / Cooke, Neil; Gillam, Lee; Wrobel, Peter; Cooke, Henry; Khalid Al Obaidli, Fahad.

In: CEUR Workshop Proceedings, Vol. 1177, 01.01.2011.

Research output: Contribution to journalConference article

@article{881b82f3e32b4560b19c9c45da6d6982,
title = "A high-performance plagiarism detection system Notebook for PAN at CLEF 2011",
abstract = "In this paper we report on our high-performance plagiarism detection system which is able to process the PAN plagiarism corpus for the external plagiarism detection task within relatively short timescales in contrast to previously reported state-of-the-art, and still produce a reasonable degree of performance (PAN 11, 4th place, PlagDet=0.2467329, Recall=0.1500480, Precision=0.7106536, Granularity=1.0058894). At the core of our system is a simple method which avoids the use of hash-type approaches, but about which we are unable to disclose too many details due to a patent application in progress. We optimised our performance using the PAN10 collection, and used the best parameters for the final submission. We anticipated a relatively similar performance at PAN11, modulo changes to the plagiarism cases, and 4th place this year put us between participants who had been 5th and 6th in PAN 10.",
author = "Neil Cooke and Lee Gillam and Peter Wrobel and Henry Cooke and {Khalid Al Obaidli}, Fahad",
year = "2011",
month = "1",
day = "1",
language = "English",
volume = "1177",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - A high-performance plagiarism detection system Notebook for PAN at CLEF 2011

AU - Cooke, Neil

AU - Gillam, Lee

AU - Wrobel, Peter

AU - Cooke, Henry

AU - Khalid Al Obaidli, Fahad

PY - 2011/1/1

Y1 - 2011/1/1

N2 - In this paper we report on our high-performance plagiarism detection system which is able to process the PAN plagiarism corpus for the external plagiarism detection task within relatively short timescales in contrast to previously reported state-of-the-art, and still produce a reasonable degree of performance (PAN 11, 4th place, PlagDet=0.2467329, Recall=0.1500480, Precision=0.7106536, Granularity=1.0058894). At the core of our system is a simple method which avoids the use of hash-type approaches, but about which we are unable to disclose too many details due to a patent application in progress. We optimised our performance using the PAN10 collection, and used the best parameters for the final submission. We anticipated a relatively similar performance at PAN11, modulo changes to the plagiarism cases, and 4th place this year put us between participants who had been 5th and 6th in PAN 10.

AB - In this paper we report on our high-performance plagiarism detection system which is able to process the PAN plagiarism corpus for the external plagiarism detection task within relatively short timescales in contrast to previously reported state-of-the-art, and still produce a reasonable degree of performance (PAN 11, 4th place, PlagDet=0.2467329, Recall=0.1500480, Precision=0.7106536, Granularity=1.0058894). At the core of our system is a simple method which avoids the use of hash-type approaches, but about which we are unable to disclose too many details due to a patent application in progress. We optimised our performance using the PAN10 collection, and used the best parameters for the final submission. We anticipated a relatively similar performance at PAN11, modulo changes to the plagiarism cases, and 4th place this year put us between participants who had been 5th and 6th in PAN 10.

UR - http://www.scopus.com/inward/record.url?scp=84922032404&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922032404&partnerID=8YFLogxK

M3 - Conference article

VL - 1177

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -