PSIST

Indexing protein structures using suffix trees

Feng Gao, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Citations (Scopus)

Abstract

Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Ca atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results shows classification accuracy up to 97.8% and 99.4% at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods.

Original languageEnglish
Title of host publicationProceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
Pages212-222
Number of pages11
Volume2005
DOIs
Publication statusPublished - 1 Dec 2005
Externally publishedYes
Event2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005 - Stanford, CA, United States
Duration: 8 Aug 200511 Aug 2005

Other

Other2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
CountryUnited States
CityStanford, CA
Period8/8/0511/8/05

Fingerprint

Proteins
Protein Databases
Drug Discovery
benzoylprop-ethyl
Atoms

ASJC Scopus subject areas

  • Engineering(all)
  • Medicine(all)

Cite this

Gao, F., & Zaki, M. J. (2005). PSIST: Indexing protein structures using suffix trees. In Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005 (Vol. 2005, pp. 212-222). [1498023] https://doi.org/10.1109/CSB.2005.46

PSIST : Indexing protein structures using suffix trees. / Gao, Feng; Zaki, Mohammed J.

Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. Vol. 2005 2005. p. 212-222 1498023.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gao, F & Zaki, MJ 2005, PSIST: Indexing protein structures using suffix trees. in Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. vol. 2005, 1498023, pp. 212-222, 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005, Stanford, CA, United States, 8/8/05. https://doi.org/10.1109/CSB.2005.46
Gao F, Zaki MJ. PSIST: Indexing protein structures using suffix trees. In Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. Vol. 2005. 2005. p. 212-222. 1498023 https://doi.org/10.1109/CSB.2005.46
Gao, Feng ; Zaki, Mohammed J. / PSIST : Indexing protein structures using suffix trees. Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. Vol. 2005 2005. pp. 212-222
@inproceedings{e88ea042cc7e4bab8c568f4f7dbede02,
title = "PSIST: Indexing protein structures using suffix trees",
abstract = "Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Ca atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results shows classification accuracy up to 97.8{\%} and 99.4{\%} at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods.",
author = "Feng Gao and Zaki, {Mohammed J.}",
year = "2005",
month = "12",
day = "1",
doi = "10.1109/CSB.2005.46",
language = "English",
isbn = "0769523447",
volume = "2005",
pages = "212--222",
booktitle = "Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005",

}

TY - GEN

T1 - PSIST

T2 - Indexing protein structures using suffix trees

AU - Gao, Feng

AU - Zaki, Mohammed J.

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Ca atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results shows classification accuracy up to 97.8% and 99.4% at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods.

AB - Approaches for indexing proteins, and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we developed a new method for extracting the local feature vectors of protein structures. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between Ca atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. Our results shows classification accuracy up to 97.8% and 99.4% at the superfamily and class level according to the SCOP classification, and shows that on average 7.49 out of 10 proteins from the same superfamily are obtained among the top 10 matches. These results are competitive with the best previous methods.

UR - http://www.scopus.com/inward/record.url?scp=33745485860&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745485860&partnerID=8YFLogxK

U2 - 10.1109/CSB.2005.46

DO - 10.1109/CSB.2005.46

M3 - Conference contribution

SN - 0769523447

SN - 9780769523446

VL - 2005

SP - 212

EP - 222

BT - Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005

ER -