Big data stream learning with SAMOA

Albert Bifet, Gianmarco Morales

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0.

Original languageEnglish
Article number7022733
Pages (from-to)1199-1202
Number of pages4
JournalIEEE International Conference on Data Mining Workshops, ICDMW
Volume2015-January
Issue numberJanuary
DOIs
Publication statusPublished - 26 Jan 2015
Externally publishedYes

Fingerprint

Data mining
Learning systems
Engines
Data storage equipment
Big data
Processing

Keywords

  • Classification
  • Clustering
  • Data Streams
  • Distributed Systems
  • Machine Learning
  • Regression
  • Toolbox

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Big data stream learning with SAMOA. / Bifet, Albert; Morales, Gianmarco.

In: IEEE International Conference on Data Mining Workshops, ICDMW, Vol. 2015-January, No. January, 7022733, 26.01.2015, p. 1199-1202.

Research output: Contribution to journalArticle

Bifet, Albert ; Morales, Gianmarco. / Big data stream learning with SAMOA. In: IEEE International Conference on Data Mining Workshops, ICDMW. 2015 ; Vol. 2015-January, No. January. pp. 1199-1202.
@article{5476fcf1a74949ed92fafb1748bab77a,
title = "Big data stream learning with SAMOA",
abstract = "Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0.",
keywords = "Classification, Clustering, Data Streams, Distributed Systems, Machine Learning, Regression, Toolbox",
author = "Albert Bifet and Gianmarco Morales",
year = "2015",
month = "1",
day = "26",
doi = "10.1109/ICDMW.2014.24",
language = "English",
volume = "2015-January",
pages = "1199--1202",
journal = "IEEE International Conference on Data Mining Workshops, ICDMW",
issn = "2375-9232",
number = "January",

}

TY - JOUR

T1 - Big data stream learning with SAMOA

AU - Bifet, Albert

AU - Morales, Gianmarco

PY - 2015/1/26

Y1 - 2015/1/26

N2 - Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0.

AB - Big data is flowing into every area of our life, professional and personal. Big data is defined as datasets whose size is beyond the ability of typical software tools to capture, store, manage and analyze, due to the time and memory complexity. Velocity is one of the main properties of big data. In this demo, we present SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm, S4, and Samza. SAMOA is written in Java and is available at http://samoa-project.net under the Apache Software License version 2.0.

KW - Classification

KW - Clustering

KW - Data Streams

KW - Distributed Systems

KW - Machine Learning

KW - Regression

KW - Toolbox

UR - http://www.scopus.com/inward/record.url?scp=84936889401&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84936889401&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2014.24

DO - 10.1109/ICDMW.2014.24

M3 - Article

VL - 2015-January

SP - 1199

EP - 1202

JO - IEEE International Conference on Data Mining Workshops, ICDMW

JF - IEEE International Conference on Data Mining Workshops, ICDMW

SN - 2375-9232

IS - January

M1 - 7022733

ER -