The ParTriCluster algorithm for gene expression analysis

Renata Braga Araújo, Guilherme Henrique Trielli Ferreira, Gustavo Henrique Orair, Wagner Meira, Renato Antônio Celso Ferreira, Dorgival Olavo Guedes Neto, Mohammed Javeed Zaki

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.

Original languageEnglish
Pages (from-to)226-249
Number of pages24
JournalInternational Journal of Parallel Programming
Volume36
Issue number2
DOIs
Publication statusPublished - 1 Apr 2008
Externally publishedYes

Fingerprint

Gene Expression Analysis
Gene expression
Parallelization
Timestamp
Genes
Gene
Depth-first Search
Parallel programming
NP-completeness
Programming Environments
Parallel Programming
Bioinformatics
Gene Expression
Paradigm
Filter
Evaluate
Experiment
Experiments

Keywords

  • Bioinformatics
  • Clustering
  • Depth-first search
  • Parallel programming

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computational Theory and Mathematics

Cite this

Araújo, R. B., Ferreira, G. H. T., Orair, G. H., Meira, W., Ferreira, R. A. C., Neto, D. O. G., & Zaki, M. J. (2008). The ParTriCluster algorithm for gene expression analysis. International Journal of Parallel Programming, 36(2), 226-249. https://doi.org/10.1007/s10766-007-0067-9

The ParTriCluster algorithm for gene expression analysis. / Araújo, Renata Braga; Ferreira, Guilherme Henrique Trielli; Orair, Gustavo Henrique; Meira, Wagner; Ferreira, Renato Antônio Celso; Neto, Dorgival Olavo Guedes; Zaki, Mohammed Javeed.

In: International Journal of Parallel Programming, Vol. 36, No. 2, 01.04.2008, p. 226-249.

Research output: Contribution to journalArticle

Araújo, RB, Ferreira, GHT, Orair, GH, Meira, W, Ferreira, RAC, Neto, DOG & Zaki, MJ 2008, 'The ParTriCluster algorithm for gene expression analysis', International Journal of Parallel Programming, vol. 36, no. 2, pp. 226-249. https://doi.org/10.1007/s10766-007-0067-9
Araújo RB, Ferreira GHT, Orair GH, Meira W, Ferreira RAC, Neto DOG et al. The ParTriCluster algorithm for gene expression analysis. International Journal of Parallel Programming. 2008 Apr 1;36(2):226-249. https://doi.org/10.1007/s10766-007-0067-9
Araújo, Renata Braga ; Ferreira, Guilherme Henrique Trielli ; Orair, Gustavo Henrique ; Meira, Wagner ; Ferreira, Renato Antônio Celso ; Neto, Dorgival Olavo Guedes ; Zaki, Mohammed Javeed. / The ParTriCluster algorithm for gene expression analysis. In: International Journal of Parallel Programming. 2008 ; Vol. 36, No. 2. pp. 226-249.
@article{7caa0c0622cc4b4bbf7afe9c472fc443,
title = "The ParTriCluster algorithm for gene expression analysis",
abstract = "Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.",
keywords = "Bioinformatics, Clustering, Depth-first search, Parallel programming",
author = "Ara{\'u}jo, {Renata Braga} and Ferreira, {Guilherme Henrique Trielli} and Orair, {Gustavo Henrique} and Wagner Meira and Ferreira, {Renato Ant{\^o}nio Celso} and Neto, {Dorgival Olavo Guedes} and Zaki, {Mohammed Javeed}",
year = "2008",
month = "4",
day = "1",
doi = "10.1007/s10766-007-0067-9",
language = "English",
volume = "36",
pages = "226--249",
journal = "International Journal of Parallel Programming",
issn = "0885-7458",
publisher = "Springer New York",
number = "2",

}

TY - JOUR

T1 - The ParTriCluster algorithm for gene expression analysis

AU - Araújo, Renata Braga

AU - Ferreira, Guilherme Henrique Trielli

AU - Orair, Gustavo Henrique

AU - Meira, Wagner

AU - Ferreira, Renato Antônio Celso

AU - Neto, Dorgival Olavo Guedes

AU - Zaki, Mohammed Javeed

PY - 2008/4/1

Y1 - 2008/4/1

N2 - Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.

AB - Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.

KW - Bioinformatics

KW - Clustering

KW - Depth-first search

KW - Parallel programming

UR - http://www.scopus.com/inward/record.url?scp=42149119531&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42149119531&partnerID=8YFLogxK

U2 - 10.1007/s10766-007-0067-9

DO - 10.1007/s10766-007-0067-9

M3 - Article

VL - 36

SP - 226

EP - 249

JO - International Journal of Parallel Programming

JF - International Journal of Parallel Programming

SN - 0885-7458

IS - 2

ER -