Distributed Adaptive Model Rules for mining big data streams

Anh Thu Vu, Gianmarco Morales, Joao Gama, Albert Bifet

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in samoa (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 30000 instances per second, and achieve a speedup of up to 4.7x over the sequential version.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages345-353
Number of pages9
ISBN (Electronic)9781479956654
DOIs
Publication statusPublished - 7 Jan 2015
Externally publishedYes
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington
Duration: 27 Oct 201430 Oct 2014

Other

Other2nd IEEE International Conference on Big Data, IEEE Big Data 2014
CityWashington
Period27/10/1430/10/14

Fingerprint

Program processors
Data mining
Data storage equipment
Big data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Cite this

Vu, A. T., Morales, G., Gama, J., & Bifet, A. (2015). Distributed Adaptive Model Rules for mining big data streams. In Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014 (pp. 345-353). [7004251] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2014.7004251

Distributed Adaptive Model Rules for mining big data streams. / Vu, Anh Thu; Morales, Gianmarco; Gama, Joao; Bifet, Albert.

Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014. Institute of Electrical and Electronics Engineers Inc., 2015. p. 345-353 7004251.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vu, AT, Morales, G, Gama, J & Bifet, A 2015, Distributed Adaptive Model Rules for mining big data streams. in Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014., 7004251, Institute of Electrical and Electronics Engineers Inc., pp. 345-353, 2nd IEEE International Conference on Big Data, IEEE Big Data 2014, Washington, 27/10/14. https://doi.org/10.1109/BigData.2014.7004251
Vu AT, Morales G, Gama J, Bifet A. Distributed Adaptive Model Rules for mining big data streams. In Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014. Institute of Electrical and Electronics Engineers Inc. 2015. p. 345-353. 7004251 https://doi.org/10.1109/BigData.2014.7004251
Vu, Anh Thu ; Morales, Gianmarco ; Gama, Joao ; Bifet, Albert. / Distributed Adaptive Model Rules for mining big data streams. Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 345-353
@inproceedings{162a7a5dd9e846f4a64c751a3c739e4e,
title = "Distributed Adaptive Model Rules for mining big data streams",
abstract = "Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in samoa (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 30000 instances per second, and achieve a speedup of up to 4.7x over the sequential version.",
author = "Vu, {Anh Thu} and Gianmarco Morales and Joao Gama and Albert Bifet",
year = "2015",
month = "1",
day = "7",
doi = "10.1109/BigData.2014.7004251",
language = "English",
pages = "345--353",
booktitle = "Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Distributed Adaptive Model Rules for mining big data streams

AU - Vu, Anh Thu

AU - Morales, Gianmarco

AU - Gama, Joao

AU - Bifet, Albert

PY - 2015/1/7

Y1 - 2015/1/7

N2 - Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in samoa (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 30000 instances per second, and achieve a speedup of up to 4.7x over the sequential version.

AB - Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in samoa (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 30000 instances per second, and achieve a speedup of up to 4.7x over the sequential version.

UR - http://www.scopus.com/inward/record.url?scp=84921726705&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84921726705&partnerID=8YFLogxK

U2 - 10.1109/BigData.2014.7004251

DO - 10.1109/BigData.2014.7004251

M3 - Conference contribution

SP - 345

EP - 353

BT - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -