Prism

An effective approach for frequent sequence mining via prime-block encoding

Karam Gouda, Mosab Hassaan, Mohammed J. Zaki

Research output: Contribution to journalArticle

37 Citations (Scopus)

Abstract

Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that Prism outperforms popular sequence mining methods like SPADE [M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learn. J. 42 (1/2) (Jan/Feb 2001) 31-60], PrefixSpan [J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M.-C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth, in: Int'l Conf. Data Engineering, April 2001] and SPAM [J. Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using bitmaps, in: SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002], by an order of magnitude or more.

Original languageEnglish
Pages (from-to)88-102
Number of pages15
JournalJournal of Computer and System Sciences
Volume76
Issue number1
DOIs
Publication statusPublished - 1 Feb 2010
Externally publishedYes

Fingerprint

Prism
Prisms
Data mining
Mining
Encoding
Sequential Patterns
Factorization
Mach number
Data Mining
L'Hôpital's Rule
Knowledge Discovery
Enumeration
Counting
Efficient Algorithms
Vertical
Engineering
Evaluation

Keywords

  • Data mining
  • Frequent sequence mining
  • Prime encoding

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computational Theory and Mathematics
  • Theoretical Computer Science
  • Applied Mathematics

Cite this

Prism : An effective approach for frequent sequence mining via prime-block encoding. / Gouda, Karam; Hassaan, Mosab; Zaki, Mohammed J.

In: Journal of Computer and System Sciences, Vol. 76, No. 1, 01.02.2010, p. 88-102.

Research output: Contribution to journalArticle

Gouda, Karam ; Hassaan, Mosab ; Zaki, Mohammed J. / Prism : An effective approach for frequent sequence mining via prime-block encoding. In: Journal of Computer and System Sciences. 2010 ; Vol. 76, No. 1. pp. 88-102.
@article{26dbd073a3714720b5c413d82e669126,
title = "Prism: An effective approach for frequent sequence mining via prime-block encoding",
abstract = "Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that Prism outperforms popular sequence mining methods like SPADE [M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learn. J. 42 (1/2) (Jan/Feb 2001) 31-60], PrefixSpan [J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M.-C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth, in: Int'l Conf. Data Engineering, April 2001] and SPAM [J. Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using bitmaps, in: SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002], by an order of magnitude or more.",
keywords = "Data mining, Frequent sequence mining, Prime encoding",
author = "Karam Gouda and Mosab Hassaan and Zaki, {Mohammed J.}",
year = "2010",
month = "2",
day = "1",
doi = "10.1016/j.jcss.2009.05.008",
language = "English",
volume = "76",
pages = "88--102",
journal = "Journal of Computer and System Sciences",
issn = "0022-0000",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Prism

T2 - An effective approach for frequent sequence mining via prime-block encoding

AU - Gouda, Karam

AU - Hassaan, Mosab

AU - Zaki, Mohammed J.

PY - 2010/2/1

Y1 - 2010/2/1

N2 - Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that Prism outperforms popular sequence mining methods like SPADE [M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learn. J. 42 (1/2) (Jan/Feb 2001) 31-60], PrefixSpan [J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M.-C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth, in: Int'l Conf. Data Engineering, April 2001] and SPAM [J. Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using bitmaps, in: SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002], by an order of magnitude or more.

AB - Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that Prism outperforms popular sequence mining methods like SPADE [M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learn. J. 42 (1/2) (Jan/Feb 2001) 31-60], PrefixSpan [J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M.-C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth, in: Int'l Conf. Data Engineering, April 2001] and SPAM [J. Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using bitmaps, in: SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002], by an order of magnitude or more.

KW - Data mining

KW - Frequent sequence mining

KW - Prime encoding

UR - http://www.scopus.com/inward/record.url?scp=71749109571&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=71749109571&partnerID=8YFLogxK

U2 - 10.1016/j.jcss.2009.05.008

DO - 10.1016/j.jcss.2009.05.008

M3 - Article

VL - 76

SP - 88

EP - 102

JO - Journal of Computer and System Sciences

JF - Journal of Computer and System Sciences

SN - 0022-0000

IS - 1

ER -