Discovering clusters with arbitrary shapes and densities in data streams

Amr Magdy, Noha Yousri, Nagwa M. El-Makky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.

Original languageEnglish
Title of host publicationProceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011
Pages279-282
Number of pages4
Volume1
DOIs
Publication statusPublished - 1 Dec 2011
Externally publishedYes
Event10th International Conference on Machine Learning and Applications, ICMLA 2011 - Honolulu, HI, United States
Duration: 18 Dec 201121 Dec 2011

Other

Other10th International Conference on Machine Learning and Applications, ICMLA 2011
CountryUnited States
CityHonolulu, HI
Period18/12/1121/12/11

Fingerprint

Clustering algorithms
Scalability
Availability

Keywords

  • Data streams mining
  • density-based clustering

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction

Cite this

Magdy, A., Yousri, N., & El-Makky, N. M. (2011). Discovering clusters with arbitrary shapes and densities in data streams. In Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011 (Vol. 1, pp. 279-282). [6146984] https://doi.org/10.1109/ICMLA.2011.56

Discovering clusters with arbitrary shapes and densities in data streams. / Magdy, Amr; Yousri, Noha; El-Makky, Nagwa M.

Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011. Vol. 1 2011. p. 279-282 6146984.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Magdy, A, Yousri, N & El-Makky, NM 2011, Discovering clusters with arbitrary shapes and densities in data streams. in Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011. vol. 1, 6146984, pp. 279-282, 10th International Conference on Machine Learning and Applications, ICMLA 2011, Honolulu, HI, United States, 18/12/11. https://doi.org/10.1109/ICMLA.2011.56
Magdy A, Yousri N, El-Makky NM. Discovering clusters with arbitrary shapes and densities in data streams. In Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011. Vol. 1. 2011. p. 279-282. 6146984 https://doi.org/10.1109/ICMLA.2011.56
Magdy, Amr ; Yousri, Noha ; El-Makky, Nagwa M. / Discovering clusters with arbitrary shapes and densities in data streams. Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011. Vol. 1 2011. pp. 279-282
@inproceedings{b1c8a5e8a74646f1a6061d582f22eb2e,
title = "Discovering clusters with arbitrary shapes and densities in data streams",
abstract = "The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.",
keywords = "Data streams mining, density-based clustering",
author = "Amr Magdy and Noha Yousri and El-Makky, {Nagwa M.}",
year = "2011",
month = "12",
day = "1",
doi = "10.1109/ICMLA.2011.56",
language = "English",
isbn = "9780769546070",
volume = "1",
pages = "279--282",
booktitle = "Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011",

}

TY - GEN

T1 - Discovering clusters with arbitrary shapes and densities in data streams

AU - Magdy, Amr

AU - Yousri, Noha

AU - El-Makky, Nagwa M.

PY - 2011/12/1

Y1 - 2011/12/1

N2 - The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.

AB - The availability of streaming data in different fields and in various forms increases the importance of streaming data analysis. The huge size of a continuously flowing data has put forward a number of challenges in data stream analysis. Exploration of the structure of streamed data represented a major challenge that resulted in introducing various clustering algorithms. However, current clustering algorithms still lack the ability to efficiently discover clusters of arbitrary densities in data streams. In this paper, a new grid-based and density-based algorithm is proposed for clustering streaming data. It addresses drawbacks of recent algorithms in discovering clusters of arbitrary densities. The algorithm uses an online component to map the input data to grid cells. An offline component is then used to cluster the grid cells based on density information. Relative density relatedness measures and a dynamic range neighborhood are proposed to differentiate clusters of arbitrary densities. The experimental evaluation shows considerable improvements upon the state-of-the-art algorithms in both clustering quality and scalability. In addition, the output quality of the proposed algorithm is less sensitive to parameter selection errors.

KW - Data streams mining

KW - density-based clustering

UR - http://www.scopus.com/inward/record.url?scp=84857823118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857823118&partnerID=8YFLogxK

U2 - 10.1109/ICMLA.2011.56

DO - 10.1109/ICMLA.2011.56

M3 - Conference contribution

SN - 9780769546070

VL - 1

SP - 279

EP - 282

BT - Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011

ER -