Towards the exploitation of statistical language models for plagiarism detection with reference

Research output: Contribution to journalConference article

1 Citation (Scopus)


To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection.

Original languageEnglish
Pages (from-to)15-19
Number of pages5
JournalCEUR Workshop Proceedings
Publication statusPublished - 1 Dec 2008
Event18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 - Patras, Greece
Duration: 22 Jul 200822 Jul 2008


ASJC Scopus subject areas

  • Computer Science(all)

Cite this