Parallel sequence mining on shared-memory machines

Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller sufix-based classes. Each class can be solved in main-memory using efficient search techniques, and simple join operations. Further each class can be solved independently on each pro-cessor requiring no synchronization. However, dynamic inter-class and intra-class load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages161-189
Number of pages29
Volume1759
ISBN (Print)3540671943, 9783540671947
Publication statusPublished - 2002
Externally publishedYes
Event5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999 - San Diego, United States
Duration: 15 Aug 199915 Aug 1999

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1759
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999
CountryUnited States
CitySan Diego
Period15/8/9915/8/99

Fingerprint

Shared Memory
Mining
Data storage equipment
Parallel algorithms
Resource allocation
Synchronization
Scale-up
Load Balancing
Experiments
Parallel Algorithms
Search Space
Join
Speedup
Class
Decompose
Experiment

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Zaki, M. J. (2002). Parallel sequence mining on shared-memory machines. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1759, pp. 161-189). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1759). Springer Verlag.

Parallel sequence mining on shared-memory machines. / Zaki, Mohammed J.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1759 Springer Verlag, 2002. p. 161-189 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1759).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaki, MJ 2002, Parallel sequence mining on shared-memory machines. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 1759, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1759, Springer Verlag, pp. 161-189, 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, San Diego, United States, 15/8/99.
Zaki MJ. Parallel sequence mining on shared-memory machines. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1759. Springer Verlag. 2002. p. 161-189. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Zaki, Mohammed J. / Parallel sequence mining on shared-memory machines. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 1759 Springer Verlag, 2002. pp. 161-189 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{b6e1592dc0d845c5a1072b13e49dec4d,
title = "Parallel sequence mining on shared-memory machines",
abstract = "We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller sufix-based classes. Each class can be solved in main-memory using efficient search techniques, and simple join operations. Further each class can be solved independently on each pro-cessor requiring no synchronization. However, dynamic inter-class and intra-class load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.",
author = "Zaki, {Mohammed J.}",
year = "2002",
language = "English",
isbn = "3540671943",
volume = "1759",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "161--189",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Parallel sequence mining on shared-memory machines

AU - Zaki, Mohammed J.

PY - 2002

Y1 - 2002

N2 - We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller sufix-based classes. Each class can be solved in main-memory using efficient search techniques, and simple join operations. Further each class can be solved independently on each pro-cessor requiring no synchronization. However, dynamic inter-class and intra-class load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.

AB - We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller sufix-based classes. Each class can be solved in main-memory using efficient search techniques, and simple join operations. Further each class can be solved independently on each pro-cessor requiring no synchronization. However, dynamic inter-class and intra-class load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.

UR - http://www.scopus.com/inward/record.url?scp=84949484447&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949484447&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84949484447

SN - 3540671943

SN - 9783540671947

VL - 1759

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 161

EP - 189

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -