VHT

Vertical hoeffding tree

Nicolas Kourtellis, Gianmarco Morales, Albert Bifet, Arinto Murdopo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

IoT big data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining big data streams, and thus able to run on real-world clusters. Our experiments to study the accuracy and throughput of VHT prove its ability to scale while attaining superior performance compared to sequential decision trees.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages915-922
Number of pages8
ISBN (Electronic)9781467390040
DOIs
Publication statusPublished - 2 Feb 2017
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: 5 Dec 20168 Dec 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period5/12/168/12/16

Fingerprint

Decision trees
Learning systems
Parallel algorithms
Throughput
Experiments
Big data

Keywords

  • Apache SAMOA
  • big data
  • distributed streaming decision tree
  • hoeffding tree
  • IoT
  • vertical parallelism

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Cite this

Kourtellis, N., Morales, G., Bifet, A., & Murdopo, A. (2017). VHT: Vertical hoeffding tree. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 915-922). [7840687] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840687

VHT : Vertical hoeffding tree. / Kourtellis, Nicolas; Morales, Gianmarco; Bifet, Albert; Murdopo, Arinto.

Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc., 2017. p. 915-922 7840687.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kourtellis, N, Morales, G, Bifet, A & Murdopo, A 2017, VHT: Vertical hoeffding tree. in Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016., 7840687, Institute of Electrical and Electronics Engineers Inc., pp. 915-922, 4th IEEE International Conference on Big Data, Big Data 2016, Washington, United States, 5/12/16. https://doi.org/10.1109/BigData.2016.7840687
Kourtellis N, Morales G, Bifet A, Murdopo A. VHT: Vertical hoeffding tree. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc. 2017. p. 915-922. 7840687 https://doi.org/10.1109/BigData.2016.7840687
Kourtellis, Nicolas ; Morales, Gianmarco ; Bifet, Albert ; Murdopo, Arinto. / VHT : Vertical hoeffding tree. Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 915-922
@inproceedings{c4fd75d46a38473db6d6a196dc47357e,
title = "VHT: Vertical hoeffding tree",
abstract = "IoT big data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining big data streams, and thus able to run on real-world clusters. Our experiments to study the accuracy and throughput of VHT prove its ability to scale while attaining superior performance compared to sequential decision trees.",
keywords = "Apache SAMOA, big data, distributed streaming decision tree, hoeffding tree, IoT, vertical parallelism",
author = "Nicolas Kourtellis and Gianmarco Morales and Albert Bifet and Arinto Murdopo",
year = "2017",
month = "2",
day = "2",
doi = "10.1109/BigData.2016.7840687",
language = "English",
pages = "915--922",
booktitle = "Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - VHT

T2 - Vertical hoeffding tree

AU - Kourtellis, Nicolas

AU - Morales, Gianmarco

AU - Bifet, Albert

AU - Murdopo, Arinto

PY - 2017/2/2

Y1 - 2017/2/2

N2 - IoT big data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining big data streams, and thus able to run on real-world clusters. Our experiments to study the accuracy and throughput of VHT prove its ability to scale while attaining superior performance compared to sequential decision trees.

AB - IoT big data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining big data streams, and thus able to run on real-world clusters. Our experiments to study the accuracy and throughput of VHT prove its ability to scale while attaining superior performance compared to sequential decision trees.

KW - Apache SAMOA

KW - big data

KW - distributed streaming decision tree

KW - hoeffding tree

KW - IoT

KW - vertical parallelism

UR - http://www.scopus.com/inward/record.url?scp=85015214889&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015214889&partnerID=8YFLogxK

U2 - 10.1109/BigData.2016.7840687

DO - 10.1109/BigData.2016.7840687

M3 - Conference contribution

SP - 915

EP - 922

BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -