Parallel Sequence Mining on Shared-Memory Machines

Mohammed J. Zaki

Research output: Contribution to journalArticle

68 Citations (Scopus)

Abstract

We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.

Original languageEnglish
Pages (from-to)401-426
Number of pages26
JournalJournal of Parallel and Distributed Computing
Volume61
Issue number3
DOIs
Publication statusPublished - 1 Mar 2001
Externally publishedYes

Fingerprint

Shared Memory
Mining
Data storage equipment
Parallel algorithms
Resource allocation
Synchronization
Suffix
Scale-up
Load Balancing
Parallel Algorithms
Search Space
Join
Speedup
Decompose
Experiments
Experiment
Class

Keywords

  • Knowledge discovery; data mining; sequential patterns; frequent sequences; temporal association rules

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

Parallel Sequence Mining on Shared-Memory Machines. / Zaki, Mohammed J.

In: Journal of Parallel and Distributed Computing, Vol. 61, No. 3, 01.03.2001, p. 401-426.

Research output: Contribution to journalArticle

@article{a23cfba1324e4f9fa722ab0e449ef381,
title = "Parallel Sequence Mining on Shared-Memory Machines",
abstract = "We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.",
keywords = "Knowledge discovery; data mining; sequential patterns; frequent sequences; temporal association rules",
author = "Zaki, {Mohammed J.}",
year = "2001",
month = "3",
day = "1",
doi = "10.1006/jpdc.2000.1695",
language = "English",
volume = "61",
pages = "401--426",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "3",

}

TY - JOUR

T1 - Parallel Sequence Mining on Shared-Memory Machines

AU - Zaki, Mohammed J.

PY - 2001/3/1

Y1 - 2001/3/1

N2 - We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.

AB - We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.

KW - Knowledge discovery; data mining; sequential patterns; frequent sequences; temporal association rules

UR - http://www.scopus.com/inward/record.url?scp=0348201974&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0348201974&partnerID=8YFLogxK

U2 - 10.1006/jpdc.2000.1695

DO - 10.1006/jpdc.2000.1695

M3 - Article

AN - SCOPUS:0348201974

VL - 61

SP - 401

EP - 426

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 3

ER -