Sequence similarity search using discrete fourier and wavelet transformation techniques

S. Alireza Aghili, Divyakant Agrawal, Amr El Abbadi

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, we study the problem of sequence similarity search. We incorporate vector transformations and apply DFT (Discrete Fourier Transformation) and DWT (Discrete Wavelet Transformation, Haar) dimensionality reduction techniques to reduce the search space/time of sequence similarity range queries. Our empirical results on a number of Prokaryote and Eukaryote DNA contig databases demonstrate up to 50-fold filtration ratio reduction of the search space and up to 13 times faster filtration. The proposed transformation techniques may easily be integrated as a pre-processing phase on top of current similarity search heuristics/techniques such as BLAST, PatternHunter, FastA and QUASAR to efficiently prune non-relevant sequences. We study the precision of applying dimensionality reduction techniques for faster and more efficient range query searches and discuss the imposed trade-offs.

Original languageEnglish
Pages (from-to)733-754
Number of pages22
JournalInternational Journal on Artificial Intelligence Tools
Volume14
Issue number5
DOIs
Publication statusPublished - 1 Oct 2005
Externally publishedYes

    Fingerprint

Keywords

  • Biological databases
  • Range query
  • Sequence similarity
  • Sequence transformation
  • String comparison

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this