Semi-supervised fuzzy c-means clustering of biological data

Michele Ceccarelli, A. Maratea

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages259-266
Number of pages8
Volume3849 LNAI
DOIs
Publication statusPublished - 23 Jun 2006
Externally publishedYes
Event6th International Workshop - Fuzzy Logic and Applications - Crema, Italy
Duration: 15 Sep 200517 Sep 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3849 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other6th International Workshop - Fuzzy Logic and Applications
CountryItaly
CityCrema
Period15/9/0517/9/05

Fingerprint

Fuzzy C-means Clustering
Fuzzy clustering
Cluster Analysis
Side Information
Fuzzy Clustering
Clustering
Metric
Semi-supervised Clustering
Learning
Entropy
Paradigm
Partition
Datasets

Keywords

  • Adaptive Metric
  • Fuzzy Clustering
  • Semi-Supervised Learning
  • Validity Index

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Ceccarelli, M., & Maratea, A. (2006). Semi-supervised fuzzy c-means clustering of biological data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3849 LNAI, pp. 259-266). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3849 LNAI). https://doi.org/10.1007/11676935_32

Semi-supervised fuzzy c-means clustering of biological data. / Ceccarelli, Michele; Maratea, A.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3849 LNAI 2006. p. 259-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3849 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ceccarelli, M & Maratea, A 2006, Semi-supervised fuzzy c-means clustering of biological data. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3849 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3849 LNAI, pp. 259-266, 6th International Workshop - Fuzzy Logic and Applications, Crema, Italy, 15/9/05. https://doi.org/10.1007/11676935_32
Ceccarelli M, Maratea A. Semi-supervised fuzzy c-means clustering of biological data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3849 LNAI. 2006. p. 259-266. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11676935_32
Ceccarelli, Michele ; Maratea, A. / Semi-supervised fuzzy c-means clustering of biological data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3849 LNAI 2006. pp. 259-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{2c47fe1ca5934b0f8d8d43ea8225ec47,
title = "Semi-supervised fuzzy c-means clustering of biological data",
abstract = "Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.",
keywords = "Adaptive Metric, Fuzzy Clustering, Semi-Supervised Learning, Validity Index",
author = "Michele Ceccarelli and A. Maratea",
year = "2006",
month = "6",
day = "23",
doi = "10.1007/11676935_32",
language = "English",
isbn = "3540325298",
volume = "3849 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "259--266",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Semi-supervised fuzzy c-means clustering of biological data

AU - Ceccarelli, Michele

AU - Maratea, A.

PY - 2006/6/23

Y1 - 2006/6/23

N2 - Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.

AB - Semi-supervised methods use a small amount of labeled data as a guide to unsupervised techniques. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side-information This fact suggests that the use of semi-supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available. This is the case of biological datasets' classification. The two more frequently used paradigms to include side-information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a preliminary step to fuzzy clustering and we show that Semi-Supervised Fuzzy Clustering (SSFC) can be an effective tool for classification of biological datasets. We used three real biological datasets and a generalized version of the Partition Entropy index to validate our results. In all cases tested the metric learning step produced a better highlight of the datasets' clustering structure.

KW - Adaptive Metric

KW - Fuzzy Clustering

KW - Semi-Supervised Learning

KW - Validity Index

UR - http://www.scopus.com/inward/record.url?scp=33745157484&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745157484&partnerID=8YFLogxK

U2 - 10.1007/11676935_32

DO - 10.1007/11676935_32

M3 - Conference contribution

SN - 3540325298

SN - 9783540325291

VL - 3849 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 259

EP - 266

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -