SPADE: An efficient algorithm for mining frequent sequences

Mohammed J. Zaki

Research output: Contribution to journalArticle

1208 Citations (Scopus)

Abstract

In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.

Original languageEnglish
Pages (from-to)31-60
Number of pages30
JournalMachine Learning
Volume42
Issue number1-2
DOIs
Publication statusPublished - 29 Sep 2001
Externally publishedYes

Fingerprint

Scalability
Data storage equipment
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Systems Engineering

Cite this

SPADE : An efficient algorithm for mining frequent sequences. / Zaki, Mohammed J.

In: Machine Learning, Vol. 42, No. 1-2, 29.09.2001, p. 31-60.

Research output: Contribution to journalArticle

Zaki, Mohammed J. / SPADE : An efficient algorithm for mining frequent sequences. In: Machine Learning. 2001 ; Vol. 42, No. 1-2. pp. 31-60.
@article{65c22d990fe34a9fbc35ca8028678326,
title = "SPADE: An efficient algorithm for mining frequent sequences",
abstract = "In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.",
author = "Zaki, {Mohammed J.}",
year = "2001",
month = "9",
day = "29",
doi = "10.1023/A:1007652502315",
language = "English",
volume = "42",
pages = "31--60",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "1-2",

}

TY - JOUR

T1 - SPADE

T2 - An efficient algorithm for mining frequent sequences

AU - Zaki, Mohammed J.

PY - 2001/9/29

Y1 - 2001/9/29

N2 - In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.

AB - In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.

UR - http://www.scopus.com/inward/record.url?scp=0034826102&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034826102&partnerID=8YFLogxK

U2 - 10.1023/A:1007652502315

DO - 10.1023/A:1007652502315

M3 - Article

AN - SCOPUS:0034826102

VL - 42

SP - 31

EP - 60

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1-2

ER -