SPARCL

Efficient and effective shape-based clustering

Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Citations (Scopus)

Abstract

Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages - the first stage runs a carefully initialized version of the Kmeans algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Data Mining, ICDM
Pages93-102
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event8th IEEE International Conference on Data Mining, ICDM 2008 - Pisa, Italy
Duration: 15 Dec 200819 Dec 2008

Other

Other8th IEEE International Conference on Data Mining, ICDM 2008
CountryItaly
CityPisa
Period15/12/0819/12/08

Fingerprint

Data mining
Seed
Scalability
Data storage equipment
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Chaoji, V., Al Hasan, M., Salem, S., & Zaki, M. J. (2008). SPARCL: Efficient and effective shape-based clustering. In Proceedings - IEEE International Conference on Data Mining, ICDM (pp. 93-102). [4781104] https://doi.org/10.1109/ICDM.2008.73

SPARCL : Efficient and effective shape-based clustering. / Chaoji, Vineet; Al Hasan, Mohammad; Salem, Saeed; Zaki, Mohammed J.

Proceedings - IEEE International Conference on Data Mining, ICDM. 2008. p. 93-102 4781104.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chaoji, V, Al Hasan, M, Salem, S & Zaki, MJ 2008, SPARCL: Efficient and effective shape-based clustering. in Proceedings - IEEE International Conference on Data Mining, ICDM., 4781104, pp. 93-102, 8th IEEE International Conference on Data Mining, ICDM 2008, Pisa, Italy, 15/12/08. https://doi.org/10.1109/ICDM.2008.73
Chaoji V, Al Hasan M, Salem S, Zaki MJ. SPARCL: Efficient and effective shape-based clustering. In Proceedings - IEEE International Conference on Data Mining, ICDM. 2008. p. 93-102. 4781104 https://doi.org/10.1109/ICDM.2008.73
Chaoji, Vineet ; Al Hasan, Mohammad ; Salem, Saeed ; Zaki, Mohammed J. / SPARCL : Efficient and effective shape-based clustering. Proceedings - IEEE International Conference on Data Mining, ICDM. 2008. pp. 93-102
@inproceedings{610bdc12e8034d10a702f4848ee0e3af,
title = "SPARCL: Efficient and effective shape-based clustering",
abstract = "Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages - the first stage runs a carefully initialized version of the Kmeans algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.",
author = "Vineet Chaoji and {Al Hasan}, Mohammad and Saeed Salem and Zaki, {Mohammed J.}",
year = "2008",
month = "12",
day = "1",
doi = "10.1109/ICDM.2008.73",
language = "English",
isbn = "9780769535029",
pages = "93--102",
booktitle = "Proceedings - IEEE International Conference on Data Mining, ICDM",

}

TY - GEN

T1 - SPARCL

T2 - Efficient and effective shape-based clustering

AU - Chaoji, Vineet

AU - Al Hasan, Mohammad

AU - Salem, Saeed

AU - Zaki, Mohammed J.

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages - the first stage runs a carefully initialized version of the Kmeans algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.

AB - Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages - the first stage runs a carefully initialized version of the Kmeans algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.

UR - http://www.scopus.com/inward/record.url?scp=67049172175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67049172175&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2008.73

DO - 10.1109/ICDM.2008.73

M3 - Conference contribution

SN - 9780769535029

SP - 93

EP - 102

BT - Proceedings - IEEE International Conference on Data Mining, ICDM

ER -