An Efficient Algorithm to Identify DNA Motifs

Mostafa Abbas, Hazem M. Bahig

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We consider the problem of identifying motifs that abstracts the task of finding short conserved sites in genomic DNA. The planted (l, d)-motif problem, PMP, is the mathematical abstraction of this problem, which consists of finding a substring of length l that occurs in each s i in a set of input sequences S = {s 1, s 2, . . ., s t} with at most d substitutions. Our propose algorithm combines the voting algorithm and pattern matching algorithm to find exact motifs. The combined algorithm is achieved by running the voting algorithm on t′ sequences, t′ < t. After that we use the pattern matching on the output of the voting algorithm and the reminder sequences, t - t′. Two values of t′ are calculated. The first value of t′ makes the running time of our proposed algorithm less than the running time of voting algorithm. The second value of t′ makes the running time of our proposed algorithm is minimal. We show that our proposed algorithm is faster than the voting algorithm by testing both algorithms on simulated data from (9, d ≤ 2) to (19, d ≤ 7). Finally, we test the performance of the combined algorithm on realistic biological data.

Original languageEnglish
Pages (from-to)387-399
Number of pages13
JournalMathematics in Computer Science
Volume7
Issue number4
DOIs
Publication statusPublished - 1 Dec 2013

Fingerprint

DNA
Efficient Algorithms
Voting
Pattern Matching
Pattern matching
Matching Algorithm
Genomics
Substitution
Substitution reactions
Testing
Output

Keywords

  • DNA motifs
  • Exact algorithms
  • Pattern matching
  • Planted (l, d)-motif
  • Transcription factor binding sites

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computational Mathematics
  • Applied Mathematics

Cite this

An Efficient Algorithm to Identify DNA Motifs. / Abbas, Mostafa; Bahig, Hazem M.

In: Mathematics in Computer Science, Vol. 7, No. 4, 01.12.2013, p. 387-399.

Research output: Contribution to journalArticle

Abbas, Mostafa ; Bahig, Hazem M. / An Efficient Algorithm to Identify DNA Motifs. In: Mathematics in Computer Science. 2013 ; Vol. 7, No. 4. pp. 387-399.
@article{546662a5f8144e1aabf75d4db335879a,
title = "An Efficient Algorithm to Identify DNA Motifs",
abstract = "We consider the problem of identifying motifs that abstracts the task of finding short conserved sites in genomic DNA. The planted (l, d)-motif problem, PMP, is the mathematical abstraction of this problem, which consists of finding a substring of length l that occurs in each s i in a set of input sequences S = {s 1, s 2, . . ., s t} with at most d substitutions. Our propose algorithm combines the voting algorithm and pattern matching algorithm to find exact motifs. The combined algorithm is achieved by running the voting algorithm on t′ sequences, t′ < t. After that we use the pattern matching on the output of the voting algorithm and the reminder sequences, t - t′. Two values of t′ are calculated. The first value of t′ makes the running time of our proposed algorithm less than the running time of voting algorithm. The second value of t′ makes the running time of our proposed algorithm is minimal. We show that our proposed algorithm is faster than the voting algorithm by testing both algorithms on simulated data from (9, d ≤ 2) to (19, d ≤ 7). Finally, we test the performance of the combined algorithm on realistic biological data.",
keywords = "DNA motifs, Exact algorithms, Pattern matching, Planted (l, d)-motif, Transcription factor binding sites",
author = "Mostafa Abbas and Bahig, {Hazem M.}",
year = "2013",
month = "12",
day = "1",
doi = "10.1007/s11786-013-0165-6",
language = "English",
volume = "7",
pages = "387--399",
journal = "Mathematics in Computer Science",
issn = "1661-8270",
publisher = "Birkhauser Verlag Basel",
number = "4",

}

TY - JOUR

T1 - An Efficient Algorithm to Identify DNA Motifs

AU - Abbas, Mostafa

AU - Bahig, Hazem M.

PY - 2013/12/1

Y1 - 2013/12/1

N2 - We consider the problem of identifying motifs that abstracts the task of finding short conserved sites in genomic DNA. The planted (l, d)-motif problem, PMP, is the mathematical abstraction of this problem, which consists of finding a substring of length l that occurs in each s i in a set of input sequences S = {s 1, s 2, . . ., s t} with at most d substitutions. Our propose algorithm combines the voting algorithm and pattern matching algorithm to find exact motifs. The combined algorithm is achieved by running the voting algorithm on t′ sequences, t′ < t. After that we use the pattern matching on the output of the voting algorithm and the reminder sequences, t - t′. Two values of t′ are calculated. The first value of t′ makes the running time of our proposed algorithm less than the running time of voting algorithm. The second value of t′ makes the running time of our proposed algorithm is minimal. We show that our proposed algorithm is faster than the voting algorithm by testing both algorithms on simulated data from (9, d ≤ 2) to (19, d ≤ 7). Finally, we test the performance of the combined algorithm on realistic biological data.

AB - We consider the problem of identifying motifs that abstracts the task of finding short conserved sites in genomic DNA. The planted (l, d)-motif problem, PMP, is the mathematical abstraction of this problem, which consists of finding a substring of length l that occurs in each s i in a set of input sequences S = {s 1, s 2, . . ., s t} with at most d substitutions. Our propose algorithm combines the voting algorithm and pattern matching algorithm to find exact motifs. The combined algorithm is achieved by running the voting algorithm on t′ sequences, t′ < t. After that we use the pattern matching on the output of the voting algorithm and the reminder sequences, t - t′. Two values of t′ are calculated. The first value of t′ makes the running time of our proposed algorithm less than the running time of voting algorithm. The second value of t′ makes the running time of our proposed algorithm is minimal. We show that our proposed algorithm is faster than the voting algorithm by testing both algorithms on simulated data from (9, d ≤ 2) to (19, d ≤ 7). Finally, we test the performance of the combined algorithm on realistic biological data.

KW - DNA motifs

KW - Exact algorithms

KW - Pattern matching

KW - Planted (l, d)-motif

KW - Transcription factor binding sites

UR - http://www.scopus.com/inward/record.url?scp=84895910996&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895910996&partnerID=8YFLogxK

U2 - 10.1007/s11786-013-0165-6

DO - 10.1007/s11786-013-0165-6

M3 - Article

VL - 7

SP - 387

EP - 399

JO - Mathematics in Computer Science

JF - Mathematics in Computer Science

SN - 1661-8270

IS - 4

ER -