ABACUS

Mining arbitrary shaped clusters from large datasets based on backbone identification

Vineet Chaoji, Geng Li, Hilmi Yildirim, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)

Abstract

A wide variety of clustering algorithms exist that cater to applications based on certain special characteristics of the data. Our focus is on methods that capture arbitrary shaped clusters in data, the so called spatial clustering algorithms. With the growing size of spatial datasets from diverse sources, the need for scalable algorithms is paramount. We propose a shape-based clustering algorithm, ABACUS, that scales to large datasets. ABACUS is based on the idea of identifying the intrinsic structure for each cluster, which we also refer to as the backbone of that cluster. The backbone comprises of a much smaller set of points, thus giving this method the desired ability to scale to larger datasets. ABACUS operates in two stages. In the first stage, we identify the backbone of each cluster via an iterative process made up of globbing (or point merging) and point movement operations. The backbone enables easy identification of the true clusters in a subsequent stage. Experiments on a range of real (images from geospatial satellites, etc.) and synthetic datasets demonstrate the efficiency and effectiveness of our approach. In particular, ABACUS is over an order of magnitude faster than existing shape-based clustering methods, yet it provides a comparable or better clustering quality.

Original languageEnglish
Title of host publicationProceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
Pages295-306
Number of pages12
Publication statusPublished - 1 Dec 2011
Externally publishedYes
Event11th SIAM International Conference on Data Mining, SDM 2011 - Mesa, AZ, United States
Duration: 28 Apr 201130 Apr 2011

Other

Other11th SIAM International Conference on Data Mining, SDM 2011
CountryUnited States
CityMesa, AZ
Period28/4/1130/4/11

Fingerprint

Clustering algorithms
Merging
Satellites
Experiments

ASJC Scopus subject areas

  • Software

Cite this

Chaoji, V., Li, G., Yildirim, H., & Zaki, M. J. (2011). ABACUS: Mining arbitrary shaped clusters from large datasets based on backbone identification. In Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011 (pp. 295-306)

ABACUS : Mining arbitrary shaped clusters from large datasets based on backbone identification. / Chaoji, Vineet; Li, Geng; Yildirim, Hilmi; Zaki, Mohammed J.

Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. 2011. p. 295-306.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chaoji, V, Li, G, Yildirim, H & Zaki, MJ 2011, ABACUS: Mining arbitrary shaped clusters from large datasets based on backbone identification. in Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. pp. 295-306, 11th SIAM International Conference on Data Mining, SDM 2011, Mesa, AZ, United States, 28/4/11.
Chaoji V, Li G, Yildirim H, Zaki MJ. ABACUS: Mining arbitrary shaped clusters from large datasets based on backbone identification. In Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. 2011. p. 295-306
Chaoji, Vineet ; Li, Geng ; Yildirim, Hilmi ; Zaki, Mohammed J. / ABACUS : Mining arbitrary shaped clusters from large datasets based on backbone identification. Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. 2011. pp. 295-306
@inproceedings{1eec881047a544d098e979854bdc9f13,
title = "ABACUS: Mining arbitrary shaped clusters from large datasets based on backbone identification",
abstract = "A wide variety of clustering algorithms exist that cater to applications based on certain special characteristics of the data. Our focus is on methods that capture arbitrary shaped clusters in data, the so called spatial clustering algorithms. With the growing size of spatial datasets from diverse sources, the need for scalable algorithms is paramount. We propose a shape-based clustering algorithm, ABACUS, that scales to large datasets. ABACUS is based on the idea of identifying the intrinsic structure for each cluster, which we also refer to as the backbone of that cluster. The backbone comprises of a much smaller set of points, thus giving this method the desired ability to scale to larger datasets. ABACUS operates in two stages. In the first stage, we identify the backbone of each cluster via an iterative process made up of globbing (or point merging) and point movement operations. The backbone enables easy identification of the true clusters in a subsequent stage. Experiments on a range of real (images from geospatial satellites, etc.) and synthetic datasets demonstrate the efficiency and effectiveness of our approach. In particular, ABACUS is over an order of magnitude faster than existing shape-based clustering methods, yet it provides a comparable or better clustering quality.",
author = "Vineet Chaoji and Geng Li and Hilmi Yildirim and Zaki, {Mohammed J.}",
year = "2011",
month = "12",
day = "1",
language = "English",
isbn = "9780898719925",
pages = "295--306",
booktitle = "Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011",

}

TY - GEN

T1 - ABACUS

T2 - Mining arbitrary shaped clusters from large datasets based on backbone identification

AU - Chaoji, Vineet

AU - Li, Geng

AU - Yildirim, Hilmi

AU - Zaki, Mohammed J.

PY - 2011/12/1

Y1 - 2011/12/1

N2 - A wide variety of clustering algorithms exist that cater to applications based on certain special characteristics of the data. Our focus is on methods that capture arbitrary shaped clusters in data, the so called spatial clustering algorithms. With the growing size of spatial datasets from diverse sources, the need for scalable algorithms is paramount. We propose a shape-based clustering algorithm, ABACUS, that scales to large datasets. ABACUS is based on the idea of identifying the intrinsic structure for each cluster, which we also refer to as the backbone of that cluster. The backbone comprises of a much smaller set of points, thus giving this method the desired ability to scale to larger datasets. ABACUS operates in two stages. In the first stage, we identify the backbone of each cluster via an iterative process made up of globbing (or point merging) and point movement operations. The backbone enables easy identification of the true clusters in a subsequent stage. Experiments on a range of real (images from geospatial satellites, etc.) and synthetic datasets demonstrate the efficiency and effectiveness of our approach. In particular, ABACUS is over an order of magnitude faster than existing shape-based clustering methods, yet it provides a comparable or better clustering quality.

AB - A wide variety of clustering algorithms exist that cater to applications based on certain special characteristics of the data. Our focus is on methods that capture arbitrary shaped clusters in data, the so called spatial clustering algorithms. With the growing size of spatial datasets from diverse sources, the need for scalable algorithms is paramount. We propose a shape-based clustering algorithm, ABACUS, that scales to large datasets. ABACUS is based on the idea of identifying the intrinsic structure for each cluster, which we also refer to as the backbone of that cluster. The backbone comprises of a much smaller set of points, thus giving this method the desired ability to scale to larger datasets. ABACUS operates in two stages. In the first stage, we identify the backbone of each cluster via an iterative process made up of globbing (or point merging) and point movement operations. The backbone enables easy identification of the true clusters in a subsequent stage. Experiments on a range of real (images from geospatial satellites, etc.) and synthetic datasets demonstrate the efficiency and effectiveness of our approach. In particular, ABACUS is over an order of magnitude faster than existing shape-based clustering methods, yet it provides a comparable or better clustering quality.

UR - http://www.scopus.com/inward/record.url?scp=80052396852&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052396852&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9780898719925

SP - 295

EP - 306

BT - Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011

ER -