Indexing protein structures using suffix trees

Feng Gao, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Approaches for indexing proteins and fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this chapter, we describe a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Cα atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results show classification accuracy up to 97.8 and 99.4% at the superfamily and class level according to the SCOP classification and show that, on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results outperform the best previous methods.

Original languageEnglish
Title of host publicationMethods in Molecular Biology
Pages147-169
Number of pages23
Volume413
Publication statusPublished - 5 Oct 2007
Externally publishedYes

Publication series

NameMethods in Molecular Biology
Volume413
ISSN (Print)10643745

Fingerprint

Proteins
Protein Databases
Drug Discovery
benzoylprop-ethyl

Keywords

  • 3D database search
  • Approximate matches
  • Protein structure indexing
  • Structural motifs
  • Suffix trees

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Gao, F., & Zaki, M. J. (2007). Indexing protein structures using suffix trees. In Methods in Molecular Biology (Vol. 413, pp. 147-169). (Methods in Molecular Biology; Vol. 413).

Indexing protein structures using suffix trees. / Gao, Feng; Zaki, Mohammed J.

Methods in Molecular Biology. Vol. 413 2007. p. 147-169 (Methods in Molecular Biology; Vol. 413).

Research output: Chapter in Book/Report/Conference proceedingChapter

Gao, F & Zaki, MJ 2007, Indexing protein structures using suffix trees. in Methods in Molecular Biology. vol. 413, Methods in Molecular Biology, vol. 413, pp. 147-169.
Gao F, Zaki MJ. Indexing protein structures using suffix trees. In Methods in Molecular Biology. Vol. 413. 2007. p. 147-169. (Methods in Molecular Biology).
Gao, Feng ; Zaki, Mohammed J. / Indexing protein structures using suffix trees. Methods in Molecular Biology. Vol. 413 2007. pp. 147-169 (Methods in Molecular Biology).
@inbook{9b34c1177c3c43f4bfdf650894443771,
title = "Indexing protein structures using suffix trees",
abstract = "Approaches for indexing proteins and fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this chapter, we describe a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Cα atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results show classification accuracy up to 97.8 and 99.4{\%} at the superfamily and class level according to the SCOP classification and show that, on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results outperform the best previous methods.",
keywords = "3D database search, Approximate matches, Protein structure indexing, Structural motifs, Suffix trees",
author = "Feng Gao and Zaki, {Mohammed J.}",
year = "2007",
month = "10",
day = "5",
language = "English",
isbn = "1597455741",
volume = "413",
series = "Methods in Molecular Biology",
pages = "147--169",
booktitle = "Methods in Molecular Biology",

}

TY - CHAP

T1 - Indexing protein structures using suffix trees

AU - Gao, Feng

AU - Zaki, Mohammed J.

PY - 2007/10/5

Y1 - 2007/10/5

N2 - Approaches for indexing proteins and fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this chapter, we describe a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Cα atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results show classification accuracy up to 97.8 and 99.4% at the superfamily and class level according to the SCOP classification and show that, on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results outperform the best previous methods.

AB - Approaches for indexing proteins and fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this chapter, we describe a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Cα atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results show classification accuracy up to 97.8 and 99.4% at the superfamily and class level according to the SCOP classification and show that, on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results outperform the best previous methods.

KW - 3D database search

KW - Approximate matches

KW - Protein structure indexing

KW - Structural motifs

KW - Suffix trees

UR - http://www.scopus.com/inward/record.url?scp=37149040696&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=37149040696&partnerID=8YFLogxK

M3 - Chapter

SN - 1597455741

SN - 9781597455749

VL - 413

T3 - Methods in Molecular Biology

SP - 147

EP - 169

BT - Methods in Molecular Biology

ER -