Towards the exploitation of statistical language models for plagiarism detection with reference

Alberto Barron, Paolo Rosso

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
Pages15-19
Number of pages5
Volume377
Publication statusPublished - 2008
Externally publishedYes
Event18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 - Patras, Greece
Duration: 22 Jul 200822 Jul 2008

Other

Other18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008
CountryGreece
CityPatras
Period22/7/0822/7/08

Fingerprint

Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Barron, A., & Rosso, P. (2008). Towards the exploitation of statistical language models for plagiarism detection with reference. In CEUR Workshop Proceedings (Vol. 377, pp. 15-19)

Towards the exploitation of statistical language models for plagiarism detection with reference. / Barron, Alberto; Rosso, Paolo.

CEUR Workshop Proceedings. Vol. 377 2008. p. 15-19.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barron, A & Rosso, P 2008, Towards the exploitation of statistical language models for plagiarism detection with reference. in CEUR Workshop Proceedings. vol. 377, pp. 15-19, 18th European Conference on Artificial Intelligence, ECAI 2008 - Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008, Patras, Greece, 22/7/08.
@inproceedings{17f6fd19e70c4845983761f54e82f36f,
title = "Towards the exploitation of statistical language models for plagiarism detection with reference",
abstract = "To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection.",
author = "Alberto Barron and Paolo Rosso",
year = "2008",
language = "English",
volume = "377",
pages = "15--19",
booktitle = "CEUR Workshop Proceedings",

}

TY - GEN

T1 - Towards the exploitation of statistical language models for plagiarism detection with reference

AU - Barron, Alberto

AU - Rosso, Paolo

PY - 2008

Y1 - 2008

N2 - To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection.

AB - To plagiarise is to robe credit of another person's work. Particularly, plagiarism in text means including text fragments (and even an entire document) from an author without giving him the correspondent credit. In this work we describe our first attempt to detect plagiarised segments in a text employing statistical Language Models (LMs) and perplexity. The preliminary experiments, carried out on two specialised and literary corpora (including original, part-of-speech and stemmed versions), show that perplexity of a text segment, given a Language Model calculated over an author text, is a relevant feature in plagiarism detection.

UR - http://www.scopus.com/inward/record.url?scp=84885222499&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885222499&partnerID=8YFLogxK

M3 - Conference contribution

VL - 377

SP - 15

EP - 19

BT - CEUR Workshop Proceedings

ER -