PSIST: A scalable approach to indexing protein structures using suffix trees

Feng Gao, Mohammed J. Zaki

Research output: Contribution to journalArticle

6 Citations (Scopus)


Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50% and 92.9% at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.

Original languageEnglish
Pages (from-to)54-63
Number of pages10
JournalJournal of Parallel and Distributed Computing
Issue number1
Publication statusPublished - 1 Jan 2008
Externally publishedYes



  • Bioinformatics
  • External suffix trees
  • Protein structure indexing

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Control and Systems Engineering

Cite this