VHT: Vertical hoeffding tree

Nicolas Kourtellis, Gianmarco Morales, Albert Bifet, Arinto Murdopo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

IoT big data requires new machine learning methods able to scale to large size of data arriving at high speed. Decision trees are popular machine learning models since they are very effective, yet easy to interpret and visualize. In the literature, we can find distributed algorithms for learning decision trees, and also streaming algorithms, but not algorithms that combine both features. In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining big data streams, and thus able to run on real-world clusters. Our experiments to study the accuracy and throughput of VHT prove its ability to scale while attaining superior performance compared to sequential decision trees.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages915-922
Number of pages8
ISBN (Electronic)9781467390040
DOIs
Publication statusPublished - 2 Feb 2017
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: 5 Dec 20168 Dec 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period5/12/168/12/16

    Fingerprint

Keywords

  • Apache SAMOA
  • big data
  • distributed streaming decision tree
  • hoeffding tree
  • IoT
  • vertical parallelism

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Cite this

Kourtellis, N., Morales, G., Bifet, A., & Murdopo, A. (2017). VHT: Vertical hoeffding tree. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 915-922). [7840687] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840687