Approaches for indexing proteins and for fast and scalable searching for structures similar to a query structure have important applications such as protein structure and function prediction, protein classification and drug discovery. In this paper, we develop a new method for extracting local structural (or geometric) features from protein structures. These feature vectors are in turn converted into a set of symbols, which are then indexed using a suffix tree. For a given query, the suffix tree index can be used effectively to retrieve the maximal matches, which are then chained to obtain the local alignments. Finally, similar proteins are retrieved by their alignment score against the query. Our results show classification accuracy up to 50% and 92.9% at the topology and class level according to the CATH classification. These results outperform the best previous methods. We also show that PSIST is highly scalable due to the external suffix tree indexing approach it uses; it is able to index about 70,500 domains from SCOP in under an hour.
- External suffix trees
- Protein structure indexing
ASJC Scopus subject areas
- Computer Science Applications
- Hardware and Architecture
- Control and Systems Engineering