Efficient filtration of sequence similarity search through singular value decomposition

S. Alireza Aghili, Ozgur D. Sahin, Divyakant Agrawal, Amr El Abbadi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole-genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the Singular Value Decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a Prokaryote and a Eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2.3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs.

Original languageEnglish
Title of host publicationProceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Pages403-410
Number of pages8
Publication statusPublished - 24 Sep 2004
Externally publishedYes
EventProceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 - Taichung, Taiwan, Province of China
Duration: 19 May 200421 May 2004

Other

OtherProceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
CountryTaiwan, Province of China
CityTaichung
Period19/5/0421/5/04

Fingerprint

Singular value decomposition
Bioinformatics
Vector spaces
Processing
DNA
Genes

Keywords

  • Approximate String Search
  • Bioinformatics
  • Comparative genomics
  • Sequence Homology
  • Singular Value Decomposition

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Aghili, S. A., Sahin, O. D., Agrawal, D., & El Abbadi, A. (2004). Efficient filtration of sequence similarity search through singular value decomposition. In Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 (pp. 403-410)

Efficient filtration of sequence similarity search through singular value decomposition. / Aghili, S. Alireza; Sahin, Ozgur D.; Agrawal, Divyakant; El Abbadi, Amr.

Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004. 2004. p. 403-410.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Aghili, SA, Sahin, OD, Agrawal, D & El Abbadi, A 2004, Efficient filtration of sequence similarity search through singular value decomposition. in Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004. pp. 403-410, Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004, Taichung, Taiwan, Province of China, 19/5/04.
Aghili SA, Sahin OD, Agrawal D, El Abbadi A. Efficient filtration of sequence similarity search through singular value decomposition. In Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004. 2004. p. 403-410
Aghili, S. Alireza ; Sahin, Ozgur D. ; Agrawal, Divyakant ; El Abbadi, Amr. / Efficient filtration of sequence similarity search through singular value decomposition. Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004. 2004. pp. 403-410
@inproceedings{29602947355f45f1a69df9c9a97e4021,
title = "Efficient filtration of sequence similarity search through singular value decomposition",
abstract = "Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole-genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the Singular Value Decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a Prokaryote and a Eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2.3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs.",
keywords = "Approximate String Search, Bioinformatics, Comparative genomics, Sequence Homology, Singular Value Decomposition",
author = "Aghili, {S. Alireza} and Sahin, {Ozgur D.} and Divyakant Agrawal and {El Abbadi}, Amr",
year = "2004",
month = "9",
day = "24",
language = "English",
isbn = "0769521738",
pages = "403--410",
booktitle = "Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004",

}

TY - GEN

T1 - Efficient filtration of sequence similarity search through singular value decomposition

AU - Aghili, S. Alireza

AU - Sahin, Ozgur D.

AU - Agrawal, Divyakant

AU - El Abbadi, Amr

PY - 2004/9/24

Y1 - 2004/9/24

N2 - Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole-genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the Singular Value Decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a Prokaryote and a Eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2.3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs.

AB - Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole-genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the Singular Value Decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a Prokaryote and a Eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2.3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs.

KW - Approximate String Search

KW - Bioinformatics

KW - Comparative genomics

KW - Sequence Homology

KW - Singular Value Decomposition

UR - http://www.scopus.com/inward/record.url?scp=4544342630&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544342630&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0769521738

SN - 9780769521732

SP - 403

EP - 410

BT - Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004

ER -