Clustering with lower bound on similarity

Mohammad Al Hasan, Saeed Salem, Benjarath Pupacdi, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We propose a new method, called SimClus, for clustering with lower bound on similarity. Instead of accepting k the number of clusters to find, the alternative similarity-based approach imposes a lower bound on the similarity between an object and its corresponding cluster representative (with one representative per cluster). SimClus achieves a O(log n) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O(n). Experiments on real and synthetic datasets show that our algorithm produces more than 40% fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages122-133
Number of pages12
Volume5476 LNAI
DOIs
Publication statusPublished - 23 Jul 2009
Externally publishedYes
Event13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009 - Bangkok, Thailand
Duration: 27 Apr 200930 Apr 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5476 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
CountryThailand
CityBangkok
Period27/4/0930/4/09

Fingerprint

Clustering
Number of Clusters
Lower bound
Alternatives
Approximation
Experiment
Similarity
Experiments
Object

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Hasan, M. A., Salem, S., Pupacdi, B., & Zaki, M. J. (2009). Clustering with lower bound on similarity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5476 LNAI, pp. 122-133). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5476 LNAI). https://doi.org/10.1007/978-3-642-01307-2_14

Clustering with lower bound on similarity. / Hasan, Mohammad Al; Salem, Saeed; Pupacdi, Benjarath; Zaki, Mohammed J.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5476 LNAI 2009. p. 122-133 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5476 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hasan, MA, Salem, S, Pupacdi, B & Zaki, MJ 2009, Clustering with lower bound on similarity. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5476 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5476 LNAI, pp. 122-133, 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009, Bangkok, Thailand, 27/4/09. https://doi.org/10.1007/978-3-642-01307-2_14
Hasan MA, Salem S, Pupacdi B, Zaki MJ. Clustering with lower bound on similarity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5476 LNAI. 2009. p. 122-133. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-01307-2_14
Hasan, Mohammad Al ; Salem, Saeed ; Pupacdi, Benjarath ; Zaki, Mohammed J. / Clustering with lower bound on similarity. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5476 LNAI 2009. pp. 122-133 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ae1616e303194b4c9b0d48a639b6e27f,
title = "Clustering with lower bound on similarity",
abstract = "We propose a new method, called SimClus, for clustering with lower bound on similarity. Instead of accepting k the number of clusters to find, the alternative similarity-based approach imposes a lower bound on the similarity between an object and its corresponding cluster representative (with one representative per cluster). SimClus achieves a O(log n) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O(n). Experiments on real and synthetic datasets show that our algorithm produces more than 40{\%} fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.",
author = "Hasan, {Mohammad Al} and Saeed Salem and Benjarath Pupacdi and Zaki, {Mohammed J.}",
year = "2009",
month = "7",
day = "23",
doi = "10.1007/978-3-642-01307-2_14",
language = "English",
isbn = "3642013066",
volume = "5476 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "122--133",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Clustering with lower bound on similarity

AU - Hasan, Mohammad Al

AU - Salem, Saeed

AU - Pupacdi, Benjarath

AU - Zaki, Mohammed J.

PY - 2009/7/23

Y1 - 2009/7/23

N2 - We propose a new method, called SimClus, for clustering with lower bound on similarity. Instead of accepting k the number of clusters to find, the alternative similarity-based approach imposes a lower bound on the similarity between an object and its corresponding cluster representative (with one representative per cluster). SimClus achieves a O(log n) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O(n). Experiments on real and synthetic datasets show that our algorithm produces more than 40% fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.

AB - We propose a new method, called SimClus, for clustering with lower bound on similarity. Instead of accepting k the number of clusters to find, the alternative similarity-based approach imposes a lower bound on the similarity between an object and its corresponding cluster representative (with one representative per cluster). SimClus achieves a O(log n) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O(n). Experiments on real and synthetic datasets show that our algorithm produces more than 40% fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.

UR - http://www.scopus.com/inward/record.url?scp=67650697693&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650697693&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-01307-2_14

DO - 10.1007/978-3-642-01307-2_14

M3 - Conference contribution

SN - 3642013066

SN - 9783642013065

VL - 5476 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 122

EP - 133

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -