Motivation: Recent studies suggested that a combination of multiple single nucleotide polymorphisms (SNPs) could have more significant associations with a specific phenotype. However, to discover epistasis, the epistatic interactions of SNPs, in a large number of SNPs, is a computationally challenging task. We are, therefore, motivated to develop efficient and effective solutions for identifying epistatic interactions of SNPs.Results: In this article, we propose an efficient Cloud-based Epistasis cOmputing (eCEO) model for large-scale epistatic interaction in genome-wide association study (GWAS). Given a large number of combinations of SNPs, our eCEO model is able to distribute them to balance the load across the processing nodes. Moreover, our eCEO model can efficiently process each combination of SNPs to determine the significance of its association with the phenotype. We have implemented and evaluated our eCEO model on our own cluster of more than 40 nodes. The experiment results demonstrate that the eCEO model is computationally efficient, flexible, scalable and practical. In addition, we have also deployed our eCEO model on the Amazon Elastic Compute Cloud. Our study further confirms its efficiency and ease of use in a public cloud.
ASJC Scopus subject areas
- Molecular Biology
- Computational Theory and Mathematics
- Computer Science Applications
- Computational Mathematics
- Statistics and Probability