Analyzing gene expression patterns is becoming a highly relevant task in the Bioinformatics area. This analysis makes it possible to determine the behavior patterns of genes under various conditions, a fundamental information for treating diseases, among other applications. A recent advance in this area is the Tricluster algorithm, which is the first algorithm capable of determining 3D clusters (genes × samples × timestamps), that is, groups of genes that behave similarly across samples and timestamps. However, even though biological experiments collect an increasing amount of data to be analyzed and correlated, the triclustering problem remains a bottleneck due to its NP-Completeness, so its parallelization seems to be an essential step towards obtaining feasible solutions. In this work we propose and evaluate the implementation of a parallel version of the Tricluster algorithm using the filter-labeled-stream paradigm supported by the Anthill parallel programming environment. The results show that our parallelization scales well with the data size, being able to handle severe load imbalances that are inherent to the problem. Further more, the parallelization strategy is applicable to any depth-first searches.
- Depth-first search
- Parallel programming
ASJC Scopus subject areas
- Theoretical Computer Science
- Computational Theory and Mathematics