Trellis+: An effective approach for indexing genome-scale sequences using suffix trees

Benjarath Phoophakdee, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

With advances in high-throughput sequencing methods, and the corresponding exponential growth in sequence data, it has become critical to develop scalable data management techniques for sequence storage, retrieval and analysis. In this paper we present a novel disk-based suffix tree approach, called TRELLIS+, that effectively scales to massive amount of sequence data using only a limited amount of main-memory, based on a novel string buffering strategy. We show experimentally that TRELLIS+ outperforms existing suffix tree approaches; it is able to index genome-scale sequences (e.g., the entire Human genome), and it also allows rapid query processing over the disk-based index. Availability: TRELLIS+ source code is available online at

Original languageEnglish
Title of host publicationPacific Symposium on Biocomputing 2008, PSB 2008
Pages90-101
Number of pages12
Publication statusPublished - 1 Dec 2008
Externally publishedYes
Event13th Pacific Symposium on Biocomputing, PSB 2008 - Kohala Coast, HI, United States
Duration: 4 Jan 20088 Jan 2008

Other

Other13th Pacific Symposium on Biocomputing, PSB 2008
CountryUnited States
CityKohala Coast, HI
Period4/1/088/1/08

Fingerprint

Genes
Genome
Query processing
Human Genome
Information management
Throughput
Availability
Data storage equipment
Growth
benzoylprop-ethyl

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering
  • Medicine(all)

Cite this

Phoophakdee, B., & Zaki, M. J. (2008). Trellis+: An effective approach for indexing genome-scale sequences using suffix trees. In Pacific Symposium on Biocomputing 2008, PSB 2008 (pp. 90-101)

Trellis+ : An effective approach for indexing genome-scale sequences using suffix trees. / Phoophakdee, Benjarath; Zaki, Mohammed J.

Pacific Symposium on Biocomputing 2008, PSB 2008. 2008. p. 90-101.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Phoophakdee, B & Zaki, MJ 2008, Trellis+: An effective approach for indexing genome-scale sequences using suffix trees. in Pacific Symposium on Biocomputing 2008, PSB 2008. pp. 90-101, 13th Pacific Symposium on Biocomputing, PSB 2008, Kohala Coast, HI, United States, 4/1/08.
Phoophakdee B, Zaki MJ. Trellis+: An effective approach for indexing genome-scale sequences using suffix trees. In Pacific Symposium on Biocomputing 2008, PSB 2008. 2008. p. 90-101
Phoophakdee, Benjarath ; Zaki, Mohammed J. / Trellis+ : An effective approach for indexing genome-scale sequences using suffix trees. Pacific Symposium on Biocomputing 2008, PSB 2008. 2008. pp. 90-101
@inproceedings{e8a3c14690bf4c27840c98d240215ee2,
title = "Trellis+: An effective approach for indexing genome-scale sequences using suffix trees",
abstract = "With advances in high-throughput sequencing methods, and the corresponding exponential growth in sequence data, it has become critical to develop scalable data management techniques for sequence storage, retrieval and analysis. In this paper we present a novel disk-based suffix tree approach, called TRELLIS+, that effectively scales to massive amount of sequence data using only a limited amount of main-memory, based on a novel string buffering strategy. We show experimentally that TRELLIS+ outperforms existing suffix tree approaches; it is able to index genome-scale sequences (e.g., the entire Human genome), and it also allows rapid query processing over the disk-based index. Availability: TRELLIS+ source code is available online at",
author = "Benjarath Phoophakdee and Zaki, {Mohammed J.}",
year = "2008",
month = "12",
day = "1",
language = "English",
isbn = "9812776087",
pages = "90--101",
booktitle = "Pacific Symposium on Biocomputing 2008, PSB 2008",

}

TY - GEN

T1 - Trellis+

T2 - An effective approach for indexing genome-scale sequences using suffix trees

AU - Phoophakdee, Benjarath

AU - Zaki, Mohammed J.

PY - 2008/12/1

Y1 - 2008/12/1

N2 - With advances in high-throughput sequencing methods, and the corresponding exponential growth in sequence data, it has become critical to develop scalable data management techniques for sequence storage, retrieval and analysis. In this paper we present a novel disk-based suffix tree approach, called TRELLIS+, that effectively scales to massive amount of sequence data using only a limited amount of main-memory, based on a novel string buffering strategy. We show experimentally that TRELLIS+ outperforms existing suffix tree approaches; it is able to index genome-scale sequences (e.g., the entire Human genome), and it also allows rapid query processing over the disk-based index. Availability: TRELLIS+ source code is available online at

AB - With advances in high-throughput sequencing methods, and the corresponding exponential growth in sequence data, it has become critical to develop scalable data management techniques for sequence storage, retrieval and analysis. In this paper we present a novel disk-based suffix tree approach, called TRELLIS+, that effectively scales to massive amount of sequence data using only a limited amount of main-memory, based on a novel string buffering strategy. We show experimentally that TRELLIS+ outperforms existing suffix tree approaches; it is able to index genome-scale sequences (e.g., the entire Human genome), and it also allows rapid query processing over the disk-based index. Availability: TRELLIS+ source code is available online at

UR - http://www.scopus.com/inward/record.url?scp=40549085012&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=40549085012&partnerID=8YFLogxK

M3 - Conference contribution

C2 - 18229678

AN - SCOPUS:40549085012

SN - 9812776087

SN - 9789812776082

SP - 90

EP - 101

BT - Pacific Symposium on Biocomputing 2008, PSB 2008

ER -