SMpred

A support vector machine approach to identify structural motifs in protein structure without using evolutionary information

Ganesan Pugalenthi, Krishna Kumar Kandaswamy, P. N. Suganthan, R. Sowdhamini, Thomas Martinetz, Prasanna Kolatkar

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index-files/SMpred.htm.

Original languageEnglish
Pages (from-to)405-414
Number of pages10
JournalJournal of Biomolecular Structure and Dynamics
Volume28
Issue number3
Publication statusPublished - 1 Dec 2010
Externally publishedYes

Fingerprint

Amino Acid Motifs
Sequence Homology
Protein Databases
Proteins
Sensitivity and Specificity
Support Vector Machine

Keywords

  • Fingerprint
  • Protein folding
  • Protein function
  • Structural motifs
  • Support vector machine

ASJC Scopus subject areas

  • Molecular Biology
  • Structural Biology

Cite this

SMpred : A support vector machine approach to identify structural motifs in protein structure without using evolutionary information. / Pugalenthi, Ganesan; Kandaswamy, Krishna Kumar; Suganthan, P. N.; Sowdhamini, R.; Martinetz, Thomas; Kolatkar, Prasanna.

In: Journal of Biomolecular Structure and Dynamics, Vol. 28, No. 3, 01.12.2010, p. 405-414.

Research output: Contribution to journalArticle

Pugalenthi, Ganesan ; Kandaswamy, Krishna Kumar ; Suganthan, P. N. ; Sowdhamini, R. ; Martinetz, Thomas ; Kolatkar, Prasanna. / SMpred : A support vector machine approach to identify structural motifs in protein structure without using evolutionary information. In: Journal of Biomolecular Structure and Dynamics. 2010 ; Vol. 28, No. 3. pp. 405-414.
@article{3df1c23f23c340bf8a48e1594b21bfc3,
title = "SMpred: A support vector machine approach to identify structural motifs in protein structure without using evolutionary information",
abstract = "Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79{\%} accuracy with 79.06{\%} sensitivity and 78.53{\%} specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index-files/SMpred.htm.",
keywords = "Fingerprint, Protein folding, Protein function, Structural motifs, Support vector machine",
author = "Ganesan Pugalenthi and Kandaswamy, {Krishna Kumar} and Suganthan, {P. N.} and R. Sowdhamini and Thomas Martinetz and Prasanna Kolatkar",
year = "2010",
month = "12",
day = "1",
language = "English",
volume = "28",
pages = "405--414",
journal = "Journal of Biomolecular Structure and Dynamics",
issn = "0739-1102",
publisher = "Adenine Press",
number = "3",

}

TY - JOUR

T1 - SMpred

T2 - A support vector machine approach to identify structural motifs in protein structure without using evolutionary information

AU - Pugalenthi, Ganesan

AU - Kandaswamy, Krishna Kumar

AU - Suganthan, P. N.

AU - Sowdhamini, R.

AU - Martinetz, Thomas

AU - Kolatkar, Prasanna

PY - 2010/12/1

Y1 - 2010/12/1

N2 - Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index-files/SMpred.htm.

AB - Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index-files/SMpred.htm.

KW - Fingerprint

KW - Protein folding

KW - Protein function

KW - Structural motifs

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=78649857193&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78649857193&partnerID=8YFLogxK

M3 - Article

VL - 28

SP - 405

EP - 414

JO - Journal of Biomolecular Structure and Dynamics

JF - Journal of Biomolecular Structure and Dynamics

SN - 0739-1102

IS - 3

ER -