A new efficient and unbiased approach for clustering quality evaluation

Jean Charles Lamirel, Pascal Cuxac, RaghvenPhDa Mall, Ghada Safi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Traditional quality indexes (Inertia, DB, ...) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.

Original languageEnglish
Title of host publicationNew Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers
Pages209-220
Number of pages12
DOIs
Publication statusPublished - 7 Mar 2012
Externally publishedYes
Event15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011 - Shenzhen
Duration: 24 May 201127 May 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7104 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011
CityShenzhen
Period24/5/1127/5/11

    Fingerprint

Keywords

  • clustering
  • labeling maximization
  • quality indexes
  • unsupervised precision
  • unsupervised recall

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Lamirel, J. C., Cuxac, P., Mall, R., & Safi, G. (2012). A new efficient and unbiased approach for clustering quality evaluation. In New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers (pp. 209-220). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7104 LNAI). https://doi.org/10.1007/978-3-642-28320-8_18