Assessing clustering reliability and features informativeness by random permutations

Michele Ceccarelli, Antonio Maratea

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Assessing the quality of a clustering's outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters' number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality, Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what's the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages878-885
Number of pages8
Volume4694 LNAI
EditionPART 3
Publication statusPublished - 1 Dec 2007
Externally publishedYes
Event11th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2007, and 17th Italian Workshop on Neural Networks, WIRN 2007 - Vietri sul Mare, Italy
Duration: 12 Sep 200714 Sep 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 3
Volume4694 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other11th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2007, and 17th Italian Workshop on Neural Networks, WIRN 2007
CountryItaly
CityVietri sul Mare
Period12/9/0714/9/07

Fingerprint

Random Permutation
Clustering algorithms
Probability distributions
Cluster Analysis
Clustering
Ensemble
Empirical Distribution
Number of Clusters
Clustering Algorithm
Probability Distribution
Partition
Evaluate

Keywords

  • Cluster stability
  • Feature selection
  • Validation index

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Ceccarelli, M., & Maratea, A. (2007). Assessing clustering reliability and features informativeness by random permutations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 3 ed., Vol. 4694 LNAI, pp. 878-885). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4694 LNAI, No. PART 3).

Assessing clustering reliability and features informativeness by random permutations. / Ceccarelli, Michele; Maratea, Antonio.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4694 LNAI PART 3. ed. 2007. p. 878-885 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4694 LNAI, No. PART 3).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ceccarelli, M & Maratea, A 2007, Assessing clustering reliability and features informativeness by random permutations. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 3 edn, vol. 4694 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 3, vol. 4694 LNAI, pp. 878-885, 11th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2007, and 17th Italian Workshop on Neural Networks, WIRN 2007, Vietri sul Mare, Italy, 12/9/07.
Ceccarelli M, Maratea A. Assessing clustering reliability and features informativeness by random permutations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 3 ed. Vol. 4694 LNAI. 2007. p. 878-885. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 3).
Ceccarelli, Michele ; Maratea, Antonio. / Assessing clustering reliability and features informativeness by random permutations. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4694 LNAI PART 3. ed. 2007. pp. 878-885 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 3).
@inproceedings{4546874d28644c97896a83fe027217b6,
title = "Assessing clustering reliability and features informativeness by random permutations",
abstract = "Assessing the quality of a clustering's outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters' number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality, Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what's the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.",
keywords = "Cluster stability, Feature selection, Validation index",
author = "Michele Ceccarelli and Antonio Maratea",
year = "2007",
month = "12",
day = "1",
language = "English",
isbn = "9783540748281",
volume = "4694 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 3",
pages = "878--885",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 3",

}

TY - GEN

T1 - Assessing clustering reliability and features informativeness by random permutations

AU - Ceccarelli, Michele

AU - Maratea, Antonio

PY - 2007/12/1

Y1 - 2007/12/1

N2 - Assessing the quality of a clustering's outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters' number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality, Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what's the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.

AB - Assessing the quality of a clustering's outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters' number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality, Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what's the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.

KW - Cluster stability

KW - Feature selection

KW - Validation index

UR - http://www.scopus.com/inward/record.url?scp=38049101392&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38049101392&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540748281

VL - 4694 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 878

EP - 885

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -