Robust partitional clustering by outlier and density insensitive seeding

Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Mohammed J. Zaki

Research output: Contribution to journalArticle

38 Citations (Scopus)

Abstract

The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality.

Original languageEnglish
Pages (from-to)994-1002
Number of pages9
JournalPattern Recognition Letters
Volume30
Issue number11
DOIs
Publication statusPublished - 1 Aug 2009
Externally publishedYes

Fingerprint

Seed
Merging

Keywords

  • k-Means
  • Partitional clustering
  • Robust initialization
  • Seed selection

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing

Cite this

Robust partitional clustering by outlier and density insensitive seeding. / Hasan, Mohammad Al; Chaoji, Vineet; Salem, Saeed; Zaki, Mohammed J.

In: Pattern Recognition Letters, Vol. 30, No. 11, 01.08.2009, p. 994-1002.

Research output: Contribution to journalArticle

Hasan, Mohammad Al ; Chaoji, Vineet ; Salem, Saeed ; Zaki, Mohammed J. / Robust partitional clustering by outlier and density insensitive seeding. In: Pattern Recognition Letters. 2009 ; Vol. 30, No. 11. pp. 994-1002.
@article{58cb0b8448e240499af9f0c769a09ed5,
title = "Robust partitional clustering by outlier and density insensitive seeding",
abstract = "The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality.",
keywords = "k-Means, Partitional clustering, Robust initialization, Seed selection",
author = "Hasan, {Mohammad Al} and Vineet Chaoji and Saeed Salem and Zaki, {Mohammed J.}",
year = "2009",
month = "8",
day = "1",
doi = "10.1016/j.patrec.2009.04.013",
language = "English",
volume = "30",
pages = "994--1002",
journal = "Pattern Recognition Letters",
issn = "0167-8655",
publisher = "Elsevier",
number = "11",

}

TY - JOUR

T1 - Robust partitional clustering by outlier and density insensitive seeding

AU - Hasan, Mohammad Al

AU - Chaoji, Vineet

AU - Salem, Saeed

AU - Zaki, Mohammed J.

PY - 2009/8/1

Y1 - 2009/8/1

N2 - The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality.

AB - The leading partitional clustering technique, k-means, is one of the most computationally efficient clustering methods. However, it produces a local optimal solution that strongly depends on its initial seeds. Bad initial seeds can also cause the splitting or merging of natural clusters even if the clusters are well separated. In this paper, we propose, ROBIN, a novel method for initial seed selection in k-means types of algorithms. It imposes constraints on the chosen seeds that lead to better clustering when k-means converges. The constraints make the seed selection method insensitive to outliers in the data and also assist it to handle variable density or multi-scale clusters. Furthermore, they (constraints) make the method deterministic, so only one run suffices to obtain good initial seeds, as opposed to traditional random seed selection approaches that need many runs to obtain good seeds that lead to satisfactory clustering. We did a comprehensive evaluation of ROBIN against state-of-the-art seeding methods on a wide range of synthetic and real datasets. We show that ROBIN consistently outperforms existing approaches in terms of the clustering quality.

KW - k-Means

KW - Partitional clustering

KW - Robust initialization

KW - Seed selection

UR - http://www.scopus.com/inward/record.url?scp=67649088034&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649088034&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2009.04.013

DO - 10.1016/j.patrec.2009.04.013

M3 - Article

VL - 30

SP - 994

EP - 1002

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

SN - 0167-8655

IS - 11

ER -