Overview of the author identification task at PAN 2014

Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Martin Potthast, Benno Stein, Patrick Juola, Miguel A. Sanchez-Perez, Alberto Barron

Research output: Chapter in Book/Report/Conference proceedingConference contribution

45 Citations (Scopus)

Abstract

The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
PublisherCEUR-WS
Pages877-897
Number of pages21
Volume1180
Publication statusPublished - 2014
Externally publishedYes
Event2014 Working Notes for CLEF Conference, CLEF 2014 - Sheffield, United Kingdom
Duration: 15 Sep 201418 Sep 2014

Other

Other2014 Working Notes for CLEF Conference, CLEF 2014
CountryUnited Kingdom
CitySheffield
Period15/9/1418/9/14

Fingerprint

Statistical tests
Uncertainty

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Stamatatos, E., Daelemans, W., Verhoeven, B., Potthast, M., Stein, B., Juola, P., ... Barron, A. (2014). Overview of the author identification task at PAN 2014. In CEUR Workshop Proceedings (Vol. 1180, pp. 877-897). CEUR-WS.

Overview of the author identification task at PAN 2014. / Stamatatos, Efstathios; Daelemans, Walter; Verhoeven, Ben; Potthast, Martin; Stein, Benno; Juola, Patrick; Sanchez-Perez, Miguel A.; Barron, Alberto.

CEUR Workshop Proceedings. Vol. 1180 CEUR-WS, 2014. p. 877-897.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Stamatatos, E, Daelemans, W, Verhoeven, B, Potthast, M, Stein, B, Juola, P, Sanchez-Perez, MA & Barron, A 2014, Overview of the author identification task at PAN 2014. in CEUR Workshop Proceedings. vol. 1180, CEUR-WS, pp. 877-897, 2014 Working Notes for CLEF Conference, CLEF 2014, Sheffield, United Kingdom, 15/9/14.
Stamatatos E, Daelemans W, Verhoeven B, Potthast M, Stein B, Juola P et al. Overview of the author identification task at PAN 2014. In CEUR Workshop Proceedings. Vol. 1180. CEUR-WS. 2014. p. 877-897
Stamatatos, Efstathios ; Daelemans, Walter ; Verhoeven, Ben ; Potthast, Martin ; Stein, Benno ; Juola, Patrick ; Sanchez-Perez, Miguel A. ; Barron, Alberto. / Overview of the author identification task at PAN 2014. CEUR Workshop Proceedings. Vol. 1180 CEUR-WS, 2014. pp. 877-897
@inproceedings{5c18fd7515e04f099137d9827a054b7f,
title = "Overview of the author identification task at PAN 2014",
abstract = "The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.",
author = "Efstathios Stamatatos and Walter Daelemans and Ben Verhoeven and Martin Potthast and Benno Stein and Patrick Juola and Sanchez-Perez, {Miguel A.} and Alberto Barron",
year = "2014",
language = "English",
volume = "1180",
pages = "877--897",
booktitle = "CEUR Workshop Proceedings",
publisher = "CEUR-WS",

}

TY - GEN

T1 - Overview of the author identification task at PAN 2014

AU - Stamatatos, Efstathios

AU - Daelemans, Walter

AU - Verhoeven, Ben

AU - Potthast, Martin

AU - Stein, Benno

AU - Juola, Patrick

AU - Sanchez-Perez, Miguel A.

AU - Barron, Alberto

PY - 2014

Y1 - 2014

N2 - The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.

AB - The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds of documents in four natural languages (Dutch, English, Greek, and Spanish) and four genres (essays, reviews, novels, opinion articles). In addition, more suitable performance measures are used focusing on the accuracy and the confidence of the predictions as well as the ability of the submitted methods to leave some problems unanswered in case there is great uncertainty. To this end, we adopt the c@1 measure, originally proposed for the question answering task. We received 13 software submissions that were evaluated in the TIRA framework. Analytical evaluation results are presented where one language-independent approach serves as a challenging baseline. Moreover, we continue the successful practice of the PAN labs to examine meta-models based on the combination of all submitted systems. Last but not least, we provide statistical significance tests to demonstrate the important differences between the submitted approaches.

UR - http://www.scopus.com/inward/record.url?scp=84906750495&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906750495&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84906750495

VL - 1180

SP - 877

EP - 897

BT - CEUR Workshop Proceedings

PB - CEUR-WS

ER -