A hybrid method for the exact planted (l, d) motif

Finding problem and its parallelization

Mostafa Abbas, Mohamed Abouelhoda, Hazem M. Bahig

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Background: Given a set of DNA sequences s1,..., st, the (l, d) motif problem is to find an l-length motif sequence M, not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. Results: In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. Conclusions: Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www. nubios.nileu.edu.eg/tools/hymotif.

Original languageEnglish
Article numberS10
JournalBMC Bioinformatics
Volume17
Issue numberSUPPL.17
DOIs
Publication statusPublished - 13 Dec 2012
Externally publishedYes

Fingerprint

Hybrid Method
Parallelization
Exact Algorithms
Memory architecture
Speedup
DNA sequences
M-sequence
Shared Memory
Subsequence
DNA Sequence
Fast Algorithm
Software
Linearly
Integrate
Necessary
Experimental Results
Research
Experiments

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

A hybrid method for the exact planted (l, d) motif : Finding problem and its parallelization. / Abbas, Mostafa; Abouelhoda, Mohamed; Bahig, Hazem M.

In: BMC Bioinformatics, Vol. 17, No. SUPPL.17, S10, 13.12.2012.

Research output: Contribution to journalArticle

@article{4dd8c008456448fd9107731b0ef4fe67,
title = "A hybrid method for the exact planted (l, d) motif: Finding problem and its parallelization",
abstract = "Background: Given a set of DNA sequences s1,..., st, the (l, d) motif problem is to find an l-length motif sequence M, not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. Results: In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24{\%} speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. Conclusions: Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www. nubios.nileu.edu.eg/tools/hymotif.",
author = "Mostafa Abbas and Mohamed Abouelhoda and Bahig, {Hazem M.}",
year = "2012",
month = "12",
day = "13",
doi = "10.1186/1471-2105-13-S17-S10",
language = "English",
volume = "17",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL.17",

}

TY - JOUR

T1 - A hybrid method for the exact planted (l, d) motif

T2 - Finding problem and its parallelization

AU - Abbas, Mostafa

AU - Abouelhoda, Mohamed

AU - Bahig, Hazem M.

PY - 2012/12/13

Y1 - 2012/12/13

N2 - Background: Given a set of DNA sequences s1,..., st, the (l, d) motif problem is to find an l-length motif sequence M, not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. Results: In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. Conclusions: Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www. nubios.nileu.edu.eg/tools/hymotif.

AB - Background: Given a set of DNA sequences s1,..., st, the (l, d) motif problem is to find an l-length motif sequence M, not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. Results: In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. Conclusions: Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www. nubios.nileu.edu.eg/tools/hymotif.

UR - http://www.scopus.com/inward/record.url?scp=84877004097&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877004097&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-S17-S10

DO - 10.1186/1471-2105-13-S17-S10

M3 - Article

VL - 17

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.17

M1 - S10

ER -