Data integration via constrained clustering

An application to enzyme clustering

Elisa Boari De Lima, Raquel Cardoso De Melo Minardi, Mohammed Javeed Zaki, Wagner Meira

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When multiple data sources are available for clustering, an a priori data integration process is usually required. This process may be costly and may not lead to good clusterings, since important information is likely to be discarded. In this paper we propose constrained clustering as a strategy for integrating data sources without losing any information. It basically consists of adding the complementary data sources as constraints that the algorithm must satisfy. As a concrete application of our approach, we focus on the problem of enzyme function prediction, which is a hard task usually performed by intensive experimental work. We use constrained clustering as a means of integrating information from diverse sources as constraints, and analyze how this additional information impacts clustering quality in an enzyme clustering application scenario. Our results show that constraints generally improve the clustering quality when compared to an unconstrained clustering algorithm.

Original languageEnglish
Title of host publicationProceedings of the 11th SIAM International Conference on Data Mining, SDM 2011
Pages83-94
Number of pages12
Publication statusPublished - 1 Dec 2011
Externally publishedYes
Event11th SIAM International Conference on Data Mining, SDM 2011 - Mesa, AZ, United States
Duration: 28 Apr 201130 Apr 2011

Other

Other11th SIAM International Conference on Data Mining, SDM 2011
CountryUnited States
CityMesa, AZ
Period28/4/1130/4/11

Fingerprint

Data integration
Enzymes
Clustering algorithms
Concretes

Keywords

  • Constrained clustering
  • Data integration
  • Enzyme clustering

ASJC Scopus subject areas

  • Software

Cite this

De Lima, E. B., De Melo Minardi, R. C., Zaki, M. J., & Meira, W. (2011). Data integration via constrained clustering: An application to enzyme clustering. In Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011 (pp. 83-94)

Data integration via constrained clustering : An application to enzyme clustering. / De Lima, Elisa Boari; De Melo Minardi, Raquel Cardoso; Zaki, Mohammed Javeed; Meira, Wagner.

Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. 2011. p. 83-94.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

De Lima, EB, De Melo Minardi, RC, Zaki, MJ & Meira, W 2011, Data integration via constrained clustering: An application to enzyme clustering. in Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. pp. 83-94, 11th SIAM International Conference on Data Mining, SDM 2011, Mesa, AZ, United States, 28/4/11.
De Lima EB, De Melo Minardi RC, Zaki MJ, Meira W. Data integration via constrained clustering: An application to enzyme clustering. In Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. 2011. p. 83-94
De Lima, Elisa Boari ; De Melo Minardi, Raquel Cardoso ; Zaki, Mohammed Javeed ; Meira, Wagner. / Data integration via constrained clustering : An application to enzyme clustering. Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011. 2011. pp. 83-94
@inproceedings{fed1b2860f414dc6b865293de43d314d,
title = "Data integration via constrained clustering: An application to enzyme clustering",
abstract = "When multiple data sources are available for clustering, an a priori data integration process is usually required. This process may be costly and may not lead to good clusterings, since important information is likely to be discarded. In this paper we propose constrained clustering as a strategy for integrating data sources without losing any information. It basically consists of adding the complementary data sources as constraints that the algorithm must satisfy. As a concrete application of our approach, we focus on the problem of enzyme function prediction, which is a hard task usually performed by intensive experimental work. We use constrained clustering as a means of integrating information from diverse sources as constraints, and analyze how this additional information impacts clustering quality in an enzyme clustering application scenario. Our results show that constraints generally improve the clustering quality when compared to an unconstrained clustering algorithm.",
keywords = "Constrained clustering, Data integration, Enzyme clustering",
author = "{De Lima}, {Elisa Boari} and {De Melo Minardi}, {Raquel Cardoso} and Zaki, {Mohammed Javeed} and Wagner Meira",
year = "2011",
month = "12",
day = "1",
language = "English",
isbn = "9780898719925",
pages = "83--94",
booktitle = "Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011",

}

TY - GEN

T1 - Data integration via constrained clustering

T2 - An application to enzyme clustering

AU - De Lima, Elisa Boari

AU - De Melo Minardi, Raquel Cardoso

AU - Zaki, Mohammed Javeed

AU - Meira, Wagner

PY - 2011/12/1

Y1 - 2011/12/1

N2 - When multiple data sources are available for clustering, an a priori data integration process is usually required. This process may be costly and may not lead to good clusterings, since important information is likely to be discarded. In this paper we propose constrained clustering as a strategy for integrating data sources without losing any information. It basically consists of adding the complementary data sources as constraints that the algorithm must satisfy. As a concrete application of our approach, we focus on the problem of enzyme function prediction, which is a hard task usually performed by intensive experimental work. We use constrained clustering as a means of integrating information from diverse sources as constraints, and analyze how this additional information impacts clustering quality in an enzyme clustering application scenario. Our results show that constraints generally improve the clustering quality when compared to an unconstrained clustering algorithm.

AB - When multiple data sources are available for clustering, an a priori data integration process is usually required. This process may be costly and may not lead to good clusterings, since important information is likely to be discarded. In this paper we propose constrained clustering as a strategy for integrating data sources without losing any information. It basically consists of adding the complementary data sources as constraints that the algorithm must satisfy. As a concrete application of our approach, we focus on the problem of enzyme function prediction, which is a hard task usually performed by intensive experimental work. We use constrained clustering as a means of integrating information from diverse sources as constraints, and analyze how this additional information impacts clustering quality in an enzyme clustering application scenario. Our results show that constraints generally improve the clustering quality when compared to an unconstrained clustering algorithm.

KW - Constrained clustering

KW - Data integration

KW - Enzyme clustering

UR - http://www.scopus.com/inward/record.url?scp=84880130227&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880130227&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9780898719925

SP - 83

EP - 94

BT - Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011

ER -