SMOTIF: Efficient structured pattern and profile motif search

Yongqiang Zhang, Mohammed J. Zaki

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Background: A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. Results: SMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites. Conclusion: SMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/sMotif/.

Original languageEnglish
Article number22
JournalAlgorithms for Molecular Biology
Volume1
Issue number1
DOIs
Publication statusPublished - 21 Nov 2006
Externally publishedYes

Fingerprint

Retroelements
Terminal Repeat Sequences
Binding sites
Binding Sites
Transcription factors
DNA sequences
Composite
Composite materials
Search Problems
Transcription Factors
Software
Transcription Factor
DNA Sequence
Open Source
Data storage equipment
Weights and Measures
Efficient Algorithms
Profile
Experimental Results
Range of data

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics
  • Molecular Biology
  • Structural Biology

Cite this

SMOTIF : Efficient structured pattern and profile motif search. / Zhang, Yongqiang; Zaki, Mohammed J.

In: Algorithms for Molecular Biology, Vol. 1, No. 1, 22, 21.11.2006.

Research output: Contribution to journalArticle

Zhang, Yongqiang ; Zaki, Mohammed J. / SMOTIF : Efficient structured pattern and profile motif search. In: Algorithms for Molecular Biology. 2006 ; Vol. 1, No. 1.
@article{e985f4e441d54f0dbfdc134d5dd2374a,
title = "SMOTIF: Efficient structured pattern and profile motif search",
abstract = "Background: A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. Results: SMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites. Conclusion: SMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/sMotif/.",
author = "Yongqiang Zhang and Zaki, {Mohammed J.}",
year = "2006",
month = "11",
day = "21",
doi = "10.1186/1748-7188-1-22",
language = "English",
volume = "1",
journal = "Algorithms for Molecular Biology",
issn = "1748-7188",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - SMOTIF

T2 - Efficient structured pattern and profile motif search

AU - Zhang, Yongqiang

AU - Zaki, Mohammed J.

PY - 2006/11/21

Y1 - 2006/11/21

N2 - Background: A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. Results: SMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites. Conclusion: SMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/sMotif/.

AB - Background: A structured motif allows variable length gaps between several components, where each component is a simple motif, which allows either no gaps or only fixed length gaps. The motif can either be represented as a pattern or a profile (also called positional weight matrix). We propose an efficient algorithm, called SMOTIF, to solve the structured motif search problem, i.e., given one or more sequences and a structured motif, SMOTIF searches the sequences for all occurrences of the motif. Potential applications include searching for long terminal repeat (LTR) retrotransposons and composite regulatory binding sites in DNA sequences. Results: SMOTIF can search for both pattern and profile motifs, and it is efficient in terms of both time and space; it outperforms SMARTFINDER, a state-of-the-art algorithm for structured motif search. Experimental results show that SMOTIF is about 7 times faster and consumes 100 times less memory than SMARTFINDER. It can effectively search for LTR retrotransposons and is well suited to searching for motifs with long range gaps. It is also successful in finding potential composite transcription factor binding sites. Conclusion: SMOTIF is a useful and efficient tool in searching for structured pattern and profile motifs. The algorithm is available as open-source at: http://www.cs.rpi.edu/~zaki/software/sMotif/.

UR - http://www.scopus.com/inward/record.url?scp=34248361861&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34248361861&partnerID=8YFLogxK

U2 - 10.1186/1748-7188-1-22

DO - 10.1186/1748-7188-1-22

M3 - Article

AN - SCOPUS:34248361861

VL - 1

JO - Algorithms for Molecular Biology

JF - Algorithms for Molecular Biology

SN - 1748-7188

IS - 1

M1 - 22

ER -