PSIST

A scalable approach to indexing protein structures using suffix trees

Feng Gao, Mohammed J. Zaki

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50% and 92.9% at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.

Original languageEnglish
Pages (from-to)54-63
Number of pages10
JournalJournal of Parallel and Distributed Computing
Volume68
Issue number1
DOIs
Publication statusPublished - 1 Jan 2008
Externally publishedYes

Fingerprint

Suffix Tree
Protein Structure
Indexing
Query
Proteins
Alignment
Protein Classification
Protein
Drug Discovery
Feature Vector
Topology
Prediction

Keywords

  • Bioinformatics
  • External suffix trees
  • Protein structure indexing

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

PSIST : A scalable approach to indexing protein structures using suffix trees. / Gao, Feng; Zaki, Mohammed J.

In: Journal of Parallel and Distributed Computing, Vol. 68, No. 1, 01.01.2008, p. 54-63.

Research output: Contribution to journalArticle

@article{ec1f01e403084abf95dcb1fecef351f4,
title = "PSIST: A scalable approach to indexing protein structures using suffix trees",
abstract = "Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50{\%} and 92.9{\%} at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.",
keywords = "Bioinformatics, External suffix trees, Protein structure indexing",
author = "Feng Gao and Zaki, {Mohammed J.}",
year = "2008",
month = "1",
day = "1",
doi = "10.1016/j.jpdc.2007.07.008",
language = "English",
volume = "68",
pages = "54--63",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - PSIST

T2 - A scalable approach to indexing protein structures using suffix trees

AU - Gao, Feng

AU - Zaki, Mohammed J.

PY - 2008/1/1

Y1 - 2008/1/1

N2 - Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50% and 92.9% at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.

AB - Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50% and 92.9% at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.

KW - Bioinformatics

KW - External suffix trees

KW - Protein structure indexing

UR - http://www.scopus.com/inward/record.url?scp=36649013506&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36649013506&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2007.07.008

DO - 10.1016/j.jpdc.2007.07.008

M3 - Article

VL - 68

SP - 54

EP - 63

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 1

ER -