A new efficient and unbiased approach for clustering quality evaluation

Jean Charles Lamirel, Pascal Cuxac, RaghvenPhDa Mall, Ghada Safi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Traditional quality indexes (Inertia, DB, ...) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.

Original languageEnglish
Title of host publicationNew Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers
Pages209-220
Number of pages12
DOIs
Publication statusPublished - 7 Mar 2012
Externally publishedYes
Event15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011 - Shenzhen
Duration: 24 May 201127 May 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7104 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011
CityShenzhen
Period24/5/1127/5/11

Fingerprint

Quality Evaluation
Macros
Clustering
Inertia
Descriptors
Dependent
Evaluate
Alternatives
Estimate

Keywords

  • clustering
  • labeling maximization
  • quality indexes
  • unsupervised precision
  • unsupervised recall

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Lamirel, J. C., Cuxac, P., Mall, R., & Safi, G. (2012). A new efficient and unbiased approach for clustering quality evaluation. In New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers (pp. 209-220). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7104 LNAI). https://doi.org/10.1007/978-3-642-28320-8_18

A new efficient and unbiased approach for clustering quality evaluation. / Lamirel, Jean Charles; Cuxac, Pascal; Mall, RaghvenPhDa; Safi, Ghada.

New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers. 2012. p. 209-220 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7104 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lamirel, JC, Cuxac, P, Mall, R & Safi, G 2012, A new efficient and unbiased approach for clustering quality evaluation. in New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7104 LNAI, pp. 209-220, 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011, Shenzhen, 24/5/11. https://doi.org/10.1007/978-3-642-28320-8_18
Lamirel JC, Cuxac P, Mall R, Safi G. A new efficient and unbiased approach for clustering quality evaluation. In New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers. 2012. p. 209-220. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-28320-8_18
Lamirel, Jean Charles ; Cuxac, Pascal ; Mall, RaghvenPhDa ; Safi, Ghada. / A new efficient and unbiased approach for clustering quality evaluation. New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers. 2012. pp. 209-220 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{2584f573fde641ada5c970c43774b180,
title = "A new efficient and unbiased approach for clustering quality evaluation",
abstract = "Traditional quality indexes (Inertia, DB, ...) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.",
keywords = "clustering, labeling maximization, quality indexes, unsupervised precision, unsupervised recall",
author = "Lamirel, {Jean Charles} and Pascal Cuxac and RaghvenPhDa Mall and Ghada Safi",
year = "2012",
month = "3",
day = "7",
doi = "10.1007/978-3-642-28320-8_18",
language = "English",
isbn = "9783642283192",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "209--220",
booktitle = "New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers",

}

TY - GEN

T1 - A new efficient and unbiased approach for clustering quality evaluation

AU - Lamirel, Jean Charles

AU - Cuxac, Pascal

AU - Mall, RaghvenPhDa

AU - Safi, Ghada

PY - 2012/3/7

Y1 - 2012/3/7

N2 - Traditional quality indexes (Inertia, DB, ...) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.

AB - Traditional quality indexes (Inertia, DB, ...) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.

KW - clustering

KW - labeling maximization

KW - quality indexes

KW - unsupervised precision

KW - unsupervised recall

UR - http://www.scopus.com/inward/record.url?scp=84857719323&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857719323&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-28320-8_18

DO - 10.1007/978-3-642-28320-8_18

M3 - Conference contribution

AN - SCOPUS:84857719323

SN - 9783642283192

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 209

EP - 220

BT - New Frontiers in Applied Data Mining - PAKDD 2011 International Workshops, Revised Selected Papers

ER -