Highly scalable Ab initio genomic motif identification

Benoît Marchand, Vladimir B. Bajic, Dinesh K. Kaushik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time.

Original languageEnglish
Title of host publicationProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11 - Seattle, WA, United States
Duration: 12 Nov 201118 Nov 2011

Other

Other2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
CountryUnited States
CitySeattle, WA
Period12/11/1118/11/11

    Fingerprint

Keywords

  • Data-flow parallel processing
  • Master-slave MPI parallel processing
  • Mixed-mode MPI-openMP parallel processing
  • Multi-level MPI collective operations
  • Multi-level workload distribution

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Marchand, B., Bajic, V. B., & Kaushik, D. K. (2011). Highly scalable Ab initio genomic motif identification. In Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis [56] https://doi.org/10.1145/2063384.2063459