A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities

Noha Yousri, Mohamed S. Kamel, Mohamed A. Ismail

Research output: Contribution to journalArticle

32 Citations (Scopus)

Abstract

It is important to find the natural clusters in high dimensional data where visualization becomes difficult. A natural cluster is a cluster of any shape and density, and it should not be restricted to a globular shape as a wide number of algorithms assume, or to a specific user-defined density as some density-based algorithms require. In this work, it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster. It is then possible to distinguish clusters based on their distance-based densities. A novel dynamic model is proposed based on new distance-relatedness measures and clustering criteria. The proposed algorithm "Mitosis" is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data. It has a good computational complexity compared to related algorithms. It performs very well on high dimensional data, discovering clusters that cannot be found by known algorithms. It also identifies outliers in the data as a by-product of the cluster formation process. A validity measure that depends on the main clustering criterion is also proposed to tune the algorithm's parameters. The theoretical bases of the algorithm and its steps are presented. Its performance is illustrated by comparing it with related algorithms on several data sets.

Original languageEnglish
Pages (from-to)1193-1209
Number of pages17
JournalPattern Recognition
Volume42
Issue number7
DOIs
Publication statusPublished - 1 Jul 2009
Externally publishedYes

Fingerprint

Dynamic models
Data visualization
Byproducts
Computational complexity

Keywords

  • Arbitrary density clusters
  • Arbitrary shaped clusters
  • Clustering
  • Distance-relatedness
  • Dynamic model
  • High dimensional data

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities. / Yousri, Noha; Kamel, Mohamed S.; Ismail, Mohamed A.

In: Pattern Recognition, Vol. 42, No. 7, 01.07.2009, p. 1193-1209.

Research output: Contribution to journalArticle

@article{2cee5708dd9f4c0d96cfbb9bbec37f76,
title = "A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities",
abstract = "It is important to find the natural clusters in high dimensional data where visualization becomes difficult. A natural cluster is a cluster of any shape and density, and it should not be restricted to a globular shape as a wide number of algorithms assume, or to a specific user-defined density as some density-based algorithms require. In this work, it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster. It is then possible to distinguish clusters based on their distance-based densities. A novel dynamic model is proposed based on new distance-relatedness measures and clustering criteria. The proposed algorithm {"}Mitosis{"} is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data. It has a good computational complexity compared to related algorithms. It performs very well on high dimensional data, discovering clusters that cannot be found by known algorithms. It also identifies outliers in the data as a by-product of the cluster formation process. A validity measure that depends on the main clustering criterion is also proposed to tune the algorithm's parameters. The theoretical bases of the algorithm and its steps are presented. Its performance is illustrated by comparing it with related algorithms on several data sets.",
keywords = "Arbitrary density clusters, Arbitrary shaped clusters, Clustering, Distance-relatedness, Dynamic model, High dimensional data",
author = "Noha Yousri and Kamel, {Mohamed S.} and Ismail, {Mohamed A.}",
year = "2009",
month = "7",
day = "1",
doi = "10.1016/j.patcog.2008.08.037",
language = "English",
volume = "42",
pages = "1193--1209",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",
number = "7",

}

TY - JOUR

T1 - A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities

AU - Yousri, Noha

AU - Kamel, Mohamed S.

AU - Ismail, Mohamed A.

PY - 2009/7/1

Y1 - 2009/7/1

N2 - It is important to find the natural clusters in high dimensional data where visualization becomes difficult. A natural cluster is a cluster of any shape and density, and it should not be restricted to a globular shape as a wide number of algorithms assume, or to a specific user-defined density as some density-based algorithms require. In this work, it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster. It is then possible to distinguish clusters based on their distance-based densities. A novel dynamic model is proposed based on new distance-relatedness measures and clustering criteria. The proposed algorithm "Mitosis" is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data. It has a good computational complexity compared to related algorithms. It performs very well on high dimensional data, discovering clusters that cannot be found by known algorithms. It also identifies outliers in the data as a by-product of the cluster formation process. A validity measure that depends on the main clustering criterion is also proposed to tune the algorithm's parameters. The theoretical bases of the algorithm and its steps are presented. Its performance is illustrated by comparing it with related algorithms on several data sets.

AB - It is important to find the natural clusters in high dimensional data where visualization becomes difficult. A natural cluster is a cluster of any shape and density, and it should not be restricted to a globular shape as a wide number of algorithms assume, or to a specific user-defined density as some density-based algorithms require. In this work, it is proposed to solve the problem by maximizing the relatedness of distances between patterns in the same cluster. It is then possible to distinguish clusters based on their distance-based densities. A novel dynamic model is proposed based on new distance-relatedness measures and clustering criteria. The proposed algorithm "Mitosis" is able to discover clusters of arbitrary shapes and arbitrary densities in high dimensional data. It has a good computational complexity compared to related algorithms. It performs very well on high dimensional data, discovering clusters that cannot be found by known algorithms. It also identifies outliers in the data as a by-product of the cluster formation process. A validity measure that depends on the main clustering criterion is also proposed to tune the algorithm's parameters. The theoretical bases of the algorithm and its steps are presented. Its performance is illustrated by comparing it with related algorithms on several data sets.

KW - Arbitrary density clusters

KW - Arbitrary shaped clusters

KW - Clustering

KW - Distance-relatedness

KW - Dynamic model

KW - High dimensional data

UR - http://www.scopus.com/inward/record.url?scp=62349090984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62349090984&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2008.08.037

DO - 10.1016/j.patcog.2008.08.037

M3 - Article

AN - SCOPUS:62349090984

VL - 42

SP - 1193

EP - 1209

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

IS - 7

ER -