Recursive style breach detection with multifaceted ensemble learning

Daniel Kopev, Dimitrina Zlatkova, Kristiyan Mitov, Atanas Atanasov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers including SVM, Random Forest, AdaBoost, MLP, and LightGBM. Whenever the model detects that style change is present, we apply it recursively, looking to find the specific positions of the change. Our approach powered the winning system for the PAN@CLEF 2018 task on Style Change Detection.

Original languageEnglish
Title of host publicationArtificial Intelligence
Subtitle of host publicationMethodology, Systems, and Applications - 18th International Conference, AIMSA 2018, Proceedings
EditorsJosef van Genabith, Gennady Agre, Thierry Declerck
PublisherSpringer Verlag
Pages126-137
Number of pages12
ISBN (Print)9783319993430
DOIs
Publication statusPublished - 1 Jan 2018
Event18th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, AIMSA 2018 - Varna, Bulgaria
Duration: 12 Sep 201814 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11089 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other18th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, AIMSA 2018
CountryBulgaria
CityVarna
Period12/9/1814/9/18

Fingerprint

Ensemble Learning
Adaptive boosting
Classifiers
Change Detection
Random Forest
AdaBoost
Ensemble
Classifier
Style
Prediction

Keywords

  • Gradient boosting machines
  • Multi-authorship
  • Natural language processing
  • Stacking ensemble
  • Style breach detection
  • Style change detection
  • Stylometry

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Kopev, D., Zlatkova, D., Mitov, K., Atanasov, A., Hardalov, M., Koychev, I., & Nakov, P. (2018). Recursive style breach detection with multifaceted ensemble learning. In J. van Genabith, G. Agre, & T. Declerck (Eds.), Artificial Intelligence: Methodology, Systems, and Applications - 18th International Conference, AIMSA 2018, Proceedings (pp. 126-137). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11089 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-319-99344-7_12

Recursive style breach detection with multifaceted ensemble learning. / Kopev, Daniel; Zlatkova, Dimitrina; Mitov, Kristiyan; Atanasov, Atanas; Hardalov, Momchil; Koychev, Ivan; Nakov, Preslav.

Artificial Intelligence: Methodology, Systems, and Applications - 18th International Conference, AIMSA 2018, Proceedings. ed. / Josef van Genabith; Gennady Agre; Thierry Declerck. Springer Verlag, 2018. p. 126-137 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11089 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kopev, D, Zlatkova, D, Mitov, K, Atanasov, A, Hardalov, M, Koychev, I & Nakov, P 2018, Recursive style breach detection with multifaceted ensemble learning. in J van Genabith, G Agre & T Declerck (eds), Artificial Intelligence: Methodology, Systems, and Applications - 18th International Conference, AIMSA 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11089 LNAI, Springer Verlag, pp. 126-137, 18th International Conference on Artificial Intelligence: Methodology, Systems, and Applications, AIMSA 2018, Varna, Bulgaria, 12/9/18. https://doi.org/10.1007/978-3-319-99344-7_12
Kopev D, Zlatkova D, Mitov K, Atanasov A, Hardalov M, Koychev I et al. Recursive style breach detection with multifaceted ensemble learning. In van Genabith J, Agre G, Declerck T, editors, Artificial Intelligence: Methodology, Systems, and Applications - 18th International Conference, AIMSA 2018, Proceedings. Springer Verlag. 2018. p. 126-137. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-99344-7_12
Kopev, Daniel ; Zlatkova, Dimitrina ; Mitov, Kristiyan ; Atanasov, Atanas ; Hardalov, Momchil ; Koychev, Ivan ; Nakov, Preslav. / Recursive style breach detection with multifaceted ensemble learning. Artificial Intelligence: Methodology, Systems, and Applications - 18th International Conference, AIMSA 2018, Proceedings. editor / Josef van Genabith ; Gennady Agre ; Thierry Declerck. Springer Verlag, 2018. pp. 126-137 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{9818f6d97c624a26bed0ed7843cd7eeb,
title = "Recursive style breach detection with multifaceted ensemble learning",
abstract = "We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers including SVM, Random Forest, AdaBoost, MLP, and LightGBM. Whenever the model detects that style change is present, we apply it recursively, looking to find the specific positions of the change. Our approach powered the winning system for the PAN@CLEF 2018 task on Style Change Detection.",
keywords = "Gradient boosting machines, Multi-authorship, Natural language processing, Stacking ensemble, Style breach detection, Style change detection, Stylometry",
author = "Daniel Kopev and Dimitrina Zlatkova and Kristiyan Mitov and Atanas Atanasov and Momchil Hardalov and Ivan Koychev and Preslav Nakov",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-319-99344-7_12",
language = "English",
isbn = "9783319993430",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "126--137",
editor = "{van Genabith}, Josef and Gennady Agre and Thierry Declerck",
booktitle = "Artificial Intelligence",

}

TY - GEN

T1 - Recursive style breach detection with multifaceted ensemble learning

AU - Kopev, Daniel

AU - Zlatkova, Dimitrina

AU - Mitov, Kristiyan

AU - Atanasov, Atanas

AU - Hardalov, Momchil

AU - Koychev, Ivan

AU - Nakov, Preslav

PY - 2018/1/1

Y1 - 2018/1/1

N2 - We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers including SVM, Random Forest, AdaBoost, MLP, and LightGBM. Whenever the model detects that style change is present, we apply it recursively, looking to find the specific positions of the change. Our approach powered the winning system for the PAN@CLEF 2018 task on Style Change Detection.

AB - We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers including SVM, Random Forest, AdaBoost, MLP, and LightGBM. Whenever the model detects that style change is present, we apply it recursively, looking to find the specific positions of the change. Our approach powered the winning system for the PAN@CLEF 2018 task on Style Change Detection.

KW - Gradient boosting machines

KW - Multi-authorship

KW - Natural language processing

KW - Stacking ensemble

KW - Style breach detection

KW - Style change detection

KW - Stylometry

UR - http://www.scopus.com/inward/record.url?scp=85053156958&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053156958&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-99344-7_12

DO - 10.1007/978-3-319-99344-7_12

M3 - Conference contribution

SN - 9783319993430

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 126

EP - 137

BT - Artificial Intelligence

A2 - van Genabith, Josef

A2 - Agre, Gennady

A2 - Declerck, Thierry

PB - Springer Verlag

ER -