BoostVHT

Boosting distributed streaming decision trees

Theodore Vasiloudis, Foteini Beligianni, Gianmarco Morales

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art.

Original languageEnglish
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery
Pages899-908
Number of pages10
VolumePart F131841
ISBN (Electronic)9781450349185
DOIs
Publication statusPublished - 6 Nov 2017
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: 6 Nov 201710 Nov 2017

Other

Other26th ACM International Conference on Information and Knowledge Management, CIKM 2017
CountrySingapore
CitySingapore
Period6/11/1710/11/17

Fingerprint

Boosting
Decision tree
Leverage
Learning algorithm
Classifier
Open source
Data streams

Keywords

  • Boosting
  • Decision trees
  • Distributed systems
  • Online learning

ASJC Scopus subject areas

  • Business, Management and Accounting(all)
  • Decision Sciences(all)

Cite this

Vasiloudis, T., Beligianni, F., & Morales, G. (2017). BoostVHT: Boosting distributed streaming decision trees. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management (Vol. Part F131841, pp. 899-908). Association for Computing Machinery. https://doi.org/10.1145/3132847.3132974

BoostVHT : Boosting distributed streaming decision trees. / Vasiloudis, Theodore; Beligianni, Foteini; Morales, Gianmarco.

CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Vol. Part F131841 Association for Computing Machinery, 2017. p. 899-908.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vasiloudis, T, Beligianni, F & Morales, G 2017, BoostVHT: Boosting distributed streaming decision trees. in CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. vol. Part F131841, Association for Computing Machinery, pp. 899-908, 26th ACM International Conference on Information and Knowledge Management, CIKM 2017, Singapore, Singapore, 6/11/17. https://doi.org/10.1145/3132847.3132974
Vasiloudis T, Beligianni F, Morales G. BoostVHT: Boosting distributed streaming decision trees. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Vol. Part F131841. Association for Computing Machinery. 2017. p. 899-908 https://doi.org/10.1145/3132847.3132974
Vasiloudis, Theodore ; Beligianni, Foteini ; Morales, Gianmarco. / BoostVHT : Boosting distributed streaming decision trees. CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. Vol. Part F131841 Association for Computing Machinery, 2017. pp. 899-908
@inproceedings{8468c815896d4620a3bdee2c5c5506c9,
title = "BoostVHT: Boosting distributed streaming decision trees",
abstract = "Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art.",
keywords = "Boosting, Decision trees, Distributed systems, Online learning",
author = "Theodore Vasiloudis and Foteini Beligianni and Gianmarco Morales",
year = "2017",
month = "11",
day = "6",
doi = "10.1145/3132847.3132974",
language = "English",
volume = "Part F131841",
pages = "899--908",
booktitle = "CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - BoostVHT

T2 - Boosting distributed streaming decision trees

AU - Vasiloudis, Theodore

AU - Beligianni, Foteini

AU - Morales, Gianmarco

PY - 2017/11/6

Y1 - 2017/11/6

N2 - Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art.

AB - Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art.

KW - Boosting

KW - Decision trees

KW - Distributed systems

KW - Online learning

UR - http://www.scopus.com/inward/record.url?scp=85037345394&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037345394&partnerID=8YFLogxK

U2 - 10.1145/3132847.3132974

DO - 10.1145/3132847.3132974

M3 - Conference contribution

VL - Part F131841

SP - 899

EP - 908

BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management

PB - Association for Computing Machinery

ER -