An ensemble-rich multi-aspect approach for robust style change detection

Notebook for PAN at CLEF-2018

Dimitrina Zlatkova, Daniel Kopev, Kristiyan Mitov, Atanas Atanasov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

Research output: Contribution to journalConference article

1 Citation (Scopus)

Abstract

We describe the winning system for the PAN@CLEF 2018 task on Style Change Detection. Given a document, the goal is to determine whether it contains style change. We present our supervised approach, which combines a TF.IDF representation of the documents with features specifically engineered for the task and which makes predictions using an ensemble of diverse models including SVM, Random Forest, AdaBoost, MLP and LightGBM. We further perform comparative analysis on the performance of the models on three different datasets, two of which we have developed for the task. Moreover, we release our code in order to enable further research.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2125
Publication statusPublished - 1 Jan 2018
Event19th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2018 - Avignon, France
Duration: 10 Sep 201814 Sep 2018

Fingerprint

Adaptive boosting

Keywords

  • Multi-authorship
  • Natural Language Processing- Gradient boosting machines-Deep Learning
  • Stacking ensemble
  • Style change
  • Stylometry

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Zlatkova, D., Kopev, D., Mitov, K., Atanasov, A., Hardalov, M., Koychev, I., & Nakov, P. (2018). An ensemble-rich multi-aspect approach for robust style change detection: Notebook for PAN at CLEF-2018. CEUR Workshop Proceedings, 2125.

An ensemble-rich multi-aspect approach for robust style change detection : Notebook for PAN at CLEF-2018. / Zlatkova, Dimitrina; Kopev, Daniel; Mitov, Kristiyan; Atanasov, Atanas; Hardalov, Momchil; Koychev, Ivan; Nakov, Preslav.

In: CEUR Workshop Proceedings, Vol. 2125, 01.01.2018.

Research output: Contribution to journalConference article

Zlatkova D, Kopev D, Mitov K, Atanasov A, Hardalov M, Koychev I et al. An ensemble-rich multi-aspect approach for robust style change detection: Notebook for PAN at CLEF-2018. CEUR Workshop Proceedings. 2018 Jan 1;2125.
Zlatkova, Dimitrina ; Kopev, Daniel ; Mitov, Kristiyan ; Atanasov, Atanas ; Hardalov, Momchil ; Koychev, Ivan ; Nakov, Preslav. / An ensemble-rich multi-aspect approach for robust style change detection : Notebook for PAN at CLEF-2018. In: CEUR Workshop Proceedings. 2018 ; Vol. 2125.
@article{a2f56437adaf4fb2ae0a1d67e387fa98,
title = "An ensemble-rich multi-aspect approach for robust style change detection: Notebook for PAN at CLEF-2018",
abstract = "We describe the winning system for the PAN@CLEF 2018 task on Style Change Detection. Given a document, the goal is to determine whether it contains style change. We present our supervised approach, which combines a TF.IDF representation of the documents with features specifically engineered for the task and which makes predictions using an ensemble of diverse models including SVM, Random Forest, AdaBoost, MLP and LightGBM. We further perform comparative analysis on the performance of the models on three different datasets, two of which we have developed for the task. Moreover, we release our code in order to enable further research.",
keywords = "Multi-authorship, Natural Language Processing- Gradient boosting machines-Deep Learning, Stacking ensemble, Style change, Stylometry",
author = "Dimitrina Zlatkova and Daniel Kopev and Kristiyan Mitov and Atanas Atanasov and Momchil Hardalov and Ivan Koychev and Preslav Nakov",
year = "2018",
month = "1",
day = "1",
language = "English",
volume = "2125",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - An ensemble-rich multi-aspect approach for robust style change detection

T2 - Notebook for PAN at CLEF-2018

AU - Zlatkova, Dimitrina

AU - Kopev, Daniel

AU - Mitov, Kristiyan

AU - Atanasov, Atanas

AU - Hardalov, Momchil

AU - Koychev, Ivan

AU - Nakov, Preslav

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We describe the winning system for the PAN@CLEF 2018 task on Style Change Detection. Given a document, the goal is to determine whether it contains style change. We present our supervised approach, which combines a TF.IDF representation of the documents with features specifically engineered for the task and which makes predictions using an ensemble of diverse models including SVM, Random Forest, AdaBoost, MLP and LightGBM. We further perform comparative analysis on the performance of the models on three different datasets, two of which we have developed for the task. Moreover, we release our code in order to enable further research.

AB - We describe the winning system for the PAN@CLEF 2018 task on Style Change Detection. Given a document, the goal is to determine whether it contains style change. We present our supervised approach, which combines a TF.IDF representation of the documents with features specifically engineered for the task and which makes predictions using an ensemble of diverse models including SVM, Random Forest, AdaBoost, MLP and LightGBM. We further perform comparative analysis on the performance of the models on three different datasets, two of which we have developed for the task. Moreover, we release our code in order to enable further research.

KW - Multi-authorship

KW - Natural Language Processing- Gradient boosting machines-Deep Learning

KW - Stacking ensemble

KW - Style change

KW - Stylometry

UR - http://www.scopus.com/inward/record.url?scp=85051087910&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051087910&partnerID=8YFLogxK

M3 - Conference article

VL - 2125

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -