VOGUE

A variable order hidden Markov model with duration based on frequent sequence mining

Mohammed J. Zaki, Christopher D. Carothers, Boleslaw K. Szymanski

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

We present VOGUE, a novel, variable order hidden Markov model with state durations, that combines two separate techniques for modeling complex patterns in sequential data: pattern mining and data modeling. VOGUE relies on a variable gap sequence mining method to extract frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build a variable order hidden Markov model (HMM), that explicitly models the gaps. The gaps implicitly model the order of the HMM, and they explicitly model the duration of each state. We apply VOGUE to a variety of real sequence data taken from domains such as protein sequence classification, Web usage logs, intrusion detection, and spelling correction. We show that VOGUE has superior classification accuracy compared to regular HMMs, higher-order HMMs, and even special purpose HMMs like HMMER, which is a state-of-the-art method for protein classification. The VOGUE implementation and the datasets used in this article are available as open-source.

Original languageEnglish
Article number5
JournalACM Transactions on Knowledge Discovery from Data
Volume4
Issue number1
DOIs
Publication statusPublished - 1 Jan 2010
Externally publishedYes

Fingerprint

Hidden Markov models
Proteins
Intrusion detection
Data structures

Keywords

  • Hidden Markov models
  • Higher-order HMM
  • HMM with duration
  • Sequence mining and modeling
  • Variable-order HMM

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

VOGUE : A variable order hidden Markov model with duration based on frequent sequence mining. / Zaki, Mohammed J.; Carothers, Christopher D.; Szymanski, Boleslaw K.

In: ACM Transactions on Knowledge Discovery from Data, Vol. 4, No. 1, 5, 01.01.2010.

Research output: Contribution to journalArticle

Zaki, Mohammed J. ; Carothers, Christopher D. ; Szymanski, Boleslaw K. / VOGUE : A variable order hidden Markov model with duration based on frequent sequence mining. In: ACM Transactions on Knowledge Discovery from Data. 2010 ; Vol. 4, No. 1.
@article{95737b49d9f742729e20ace01366e35d,
title = "VOGUE: A variable order hidden Markov model with duration based on frequent sequence mining",
abstract = "We present VOGUE, a novel, variable order hidden Markov model with state durations, that combines two separate techniques for modeling complex patterns in sequential data: pattern mining and data modeling. VOGUE relies on a variable gap sequence mining method to extract frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build a variable order hidden Markov model (HMM), that explicitly models the gaps. The gaps implicitly model the order of the HMM, and they explicitly model the duration of each state. We apply VOGUE to a variety of real sequence data taken from domains such as protein sequence classification, Web usage logs, intrusion detection, and spelling correction. We show that VOGUE has superior classification accuracy compared to regular HMMs, higher-order HMMs, and even special purpose HMMs like HMMER, which is a state-of-the-art method for protein classification. The VOGUE implementation and the datasets used in this article are available as open-source.",
keywords = "Hidden Markov models, Higher-order HMM, HMM with duration, Sequence mining and modeling, Variable-order HMM",
author = "Zaki, {Mohammed J.} and Carothers, {Christopher D.} and Szymanski, {Boleslaw K.}",
year = "2010",
month = "1",
day = "1",
doi = "10.1145/1644873.1644878",
language = "English",
volume = "4",
journal = "ACM Transactions on Knowledge Discovery from Data",
issn = "1556-4681",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

TY - JOUR

T1 - VOGUE

T2 - A variable order hidden Markov model with duration based on frequent sequence mining

AU - Zaki, Mohammed J.

AU - Carothers, Christopher D.

AU - Szymanski, Boleslaw K.

PY - 2010/1/1

Y1 - 2010/1/1

N2 - We present VOGUE, a novel, variable order hidden Markov model with state durations, that combines two separate techniques for modeling complex patterns in sequential data: pattern mining and data modeling. VOGUE relies on a variable gap sequence mining method to extract frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build a variable order hidden Markov model (HMM), that explicitly models the gaps. The gaps implicitly model the order of the HMM, and they explicitly model the duration of each state. We apply VOGUE to a variety of real sequence data taken from domains such as protein sequence classification, Web usage logs, intrusion detection, and spelling correction. We show that VOGUE has superior classification accuracy compared to regular HMMs, higher-order HMMs, and even special purpose HMMs like HMMER, which is a state-of-the-art method for protein classification. The VOGUE implementation and the datasets used in this article are available as open-source.

AB - We present VOGUE, a novel, variable order hidden Markov model with state durations, that combines two separate techniques for modeling complex patterns in sequential data: pattern mining and data modeling. VOGUE relies on a variable gap sequence mining method to extract frequent patterns with different lengths and gaps between elements. It then uses these mined sequences to build a variable order hidden Markov model (HMM), that explicitly models the gaps. The gaps implicitly model the order of the HMM, and they explicitly model the duration of each state. We apply VOGUE to a variety of real sequence data taken from domains such as protein sequence classification, Web usage logs, intrusion detection, and spelling correction. We show that VOGUE has superior classification accuracy compared to regular HMMs, higher-order HMMs, and even special purpose HMMs like HMMER, which is a state-of-the-art method for protein classification. The VOGUE implementation and the datasets used in this article are available as open-source.

KW - Hidden Markov models

KW - Higher-order HMM

KW - HMM with duration

KW - Sequence mining and modeling

KW - Variable-order HMM

UR - http://www.scopus.com/inward/record.url?scp=77955634103&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955634103&partnerID=8YFLogxK

U2 - 10.1145/1644873.1644878

DO - 10.1145/1644873.1644878

M3 - Article

VL - 4

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

SN - 1556-4681

IS - 1

M1 - 5

ER -