ABACUS: Mining arbitrary shaped clusters from large datasets based on backbone identification

Vineet Chaoji, Geng Li, Hilmi Yildirim, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

A wide variety of clustering algorithms exist that cater to applications based on certain special characteristics of the data. Our focus is on methods that capture arbitrary shaped clusters in data, the so called spatial clustering algorithms. With the growing size of spatial datasets from diverse sources, the need for scalable algorithms is paramount. We propose a shape-based clustering algorithm, ABACUS, that scales to large datasets. ABACUS is based on the idea of identifying the intrinsic structure for each cluster, which we also refer to as the backbone of that cluster. The backbone comprises of a much smaller set of points, thus giving this method the desired ability to scale to larger datasets. ABACUS operates in two stages. In the first stage, we identify the backbone of each cluster via an iterative process made up of globbing (or point merging) and point movement operations. The backbone enables easy identification of the true clusters in a subsequent stage. Experiments on a range of real (images from geospatial satellites, etc.) and synthetic datasets demonstrate the efficiency and effectiveness of our approach. In particular, ABACUS is over an order of magnitude faster than existing shape-based clustering methods, yet it provides a comparable or better clustering quality.

Original languageEnglish
Title of host publicationProceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
Pages295-306
Number of pages12
Publication statusPublished - 1 Dec 2011
Event11th SIAM International Conference on Data Mining, SDM 2011 - Mesa, AZ, United States
Duration: 28 Apr 201130 Apr 2011

Publication series

NameProceedings of the 11th SIAM International Conference on Data Mining, SDM 2011

Other

Other11th SIAM International Conference on Data Mining, SDM 2011
CountryUnited States
CityMesa, AZ
Period28/4/1130/4/11

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'ABACUS: Mining arbitrary shaped clusters from large datasets based on backbone identification'. Together they form a unique fingerprint.

  • Cite this

    Chaoji, V., Li, G., Yildirim, H., & Zaki, M. J. (2011). ABACUS: Mining arbitrary shaped clusters from large datasets based on backbone identification. In Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011 (pp. 295-306). (Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011).