We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques and simple join operations. Furthermore, each class can be solved independently on each processor requiring no synchronization. However, dynamic interclass and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup and excellent scaleup results.
- Knowledge discovery; data mining; sequential patterns; frequent sequences; temporal association rules
ASJC Scopus subject areas
- Computer Science Applications
- Hardware and Architecture
- Control and Systems Engineering