VOGUE: A novel variable order-gap state machine for modeling sequences

Bouchra Bouqata, Christopher D. Carothers, Boleslaw K. Szymanski, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE's classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages42-54
Number of pages13
Volume4213 LNAI
Publication statusPublished - 31 Oct 2006
Externally publishedYes
Event10th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2006 - Berlin, Germany
Duration: 18 Sep 200622 Sep 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4213 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2006
CountryGermany
CityBerlin
Period18/9/0622/9/06

Fingerprint

State Machine
Protein Classification
Proteins
Frequent Pattern
Data Modeling
Protein Sequence
Modeling
Markov Model
Mining
Data Mining
Hidden Markov models
Higher Order
Protein
Data mining
Data structures
Range of data
Family

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Bouqata, B., Carothers, C. D., Szymanski, B. K., & Zaki, M. J. (2006). VOGUE: A novel variable order-gap state machine for modeling sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4213 LNAI, pp. 42-54). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4213 LNAI).

VOGUE : A novel variable order-gap state machine for modeling sequences. / Bouqata, Bouchra; Carothers, Christopher D.; Szymanski, Boleslaw K.; Zaki, Mohammed J.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4213 LNAI 2006. p. 42-54 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4213 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bouqata, B, Carothers, CD, Szymanski, BK & Zaki, MJ 2006, VOGUE: A novel variable order-gap state machine for modeling sequences. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4213 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4213 LNAI, pp. 42-54, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2006, Berlin, Germany, 18/9/06.
Bouqata B, Carothers CD, Szymanski BK, Zaki MJ. VOGUE: A novel variable order-gap state machine for modeling sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4213 LNAI. 2006. p. 42-54. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Bouqata, Bouchra ; Carothers, Christopher D. ; Szymanski, Boleslaw K. ; Zaki, Mohammed J. / VOGUE : A novel variable order-gap state machine for modeling sequences. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4213 LNAI 2006. pp. 42-54 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e60991adca9545e7982f2c502137b508,
title = "VOGUE: A novel variable order-gap state machine for modeling sequences",
abstract = "We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE's classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.",
author = "Bouchra Bouqata and Carothers, {Christopher D.} and Szymanski, {Boleslaw K.} and Zaki, {Mohammed J.}",
year = "2006",
month = "10",
day = "31",
language = "English",
isbn = "3540453741",
volume = "4213 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "42--54",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - VOGUE

T2 - A novel variable order-gap state machine for modeling sequences

AU - Bouqata, Bouchra

AU - Carothers, Christopher D.

AU - Szymanski, Boleslaw K.

AU - Zaki, Mohammed J.

PY - 2006/10/31

Y1 - 2006/10/31

N2 - We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE's classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.

AB - We present VOGUE, a new state machine that combines two separate techniques for modeling long range dependencies in sequential data: data mining and data modeling. VOGUE relies on a novel Variable-Gap Sequence mining method (VGS), to mine frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build the state machine. We applied VOGUE to the task of protein sequence classification on real data from the PROSITE protein families. We show that VOGUE yields significantly better scores than higher-order Hidden Markov Models. Moreover, we show that VOGUE's classification sensitivity outperforms that of HMMER, a state-of-the-art method for protein classification.

UR - http://www.scopus.com/inward/record.url?scp=33750341634&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750341634&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33750341634

SN - 3540453741

SN - 9783540453741

VL - 4213 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 42

EP - 54

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -