Parallel algorithms for discovery of association rules

Mohammed J. Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, Wei Li

Research output: Contribution to journalArticle

177 Citations (Scopus)

Abstract

Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost. In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets and do not have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best new algorithm outperforms it by an order of magnitude.

Original languageEnglish
Pages (from-to)343-373
Number of pages31
JournalData Mining and Knowledge Discovery
Volume1
Issue number4
Publication statusPublished - 1 Dec 1997
Externally publishedYes

Fingerprint

Association rules
Parallel algorithms
Synchronization
Equivalence classes
Testbeds
Data mining
Association reactions
Scanning
Data storage equipment

Keywords

  • Association rules
  • Lattice traversal
  • Maximal hypergraph cliques
  • Parallel data mining

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). Parallel algorithms for discovery of association rules. Data Mining and Knowledge Discovery, 1(4), 343-373.

Parallel algorithms for discovery of association rules. / Zaki, Mohammed J.; Parthasarathy, Srinivasan; Ogihara, Mitsunori; Li, Wei.

In: Data Mining and Knowledge Discovery, Vol. 1, No. 4, 01.12.1997, p. 343-373.

Research output: Contribution to journalArticle

Zaki, MJ, Parthasarathy, S, Ogihara, M & Li, W 1997, 'Parallel algorithms for discovery of association rules', Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 343-373.
Zaki MJ, Parthasarathy S, Ogihara M, Li W. Parallel algorithms for discovery of association rules. Data Mining and Knowledge Discovery. 1997 Dec 1;1(4):343-373.
Zaki, Mohammed J. ; Parthasarathy, Srinivasan ; Ogihara, Mitsunori ; Li, Wei. / Parallel algorithms for discovery of association rules. In: Data Mining and Knowledge Discovery. 1997 ; Vol. 1, No. 4. pp. 343-373.
@article{8536ce0a92fa4074be5f70d4fd9e209f,
title = "Parallel algorithms for discovery of association rules",
abstract = "Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost. In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets and do not have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best new algorithm outperforms it by an order of magnitude.",
keywords = "Association rules, Lattice traversal, Maximal hypergraph cliques, Parallel data mining",
author = "Zaki, {Mohammed J.} and Srinivasan Parthasarathy and Mitsunori Ogihara and Wei Li",
year = "1997",
month = "12",
day = "1",
language = "English",
volume = "1",
pages = "343--373",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",
number = "4",

}

TY - JOUR

T1 - Parallel algorithms for discovery of association rules

AU - Zaki, Mohammed J.

AU - Parthasarathy, Srinivasan

AU - Ogihara, Mitsunori

AU - Li, Wei

PY - 1997/12/1

Y1 - 1997/12/1

N2 - Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost. In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets and do not have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best new algorithm outperforms it by an order of magnitude.

AB - Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost. In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets and do not have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best new algorithm outperforms it by an order of magnitude.

KW - Association rules

KW - Lattice traversal

KW - Maximal hypergraph cliques

KW - Parallel data mining

UR - http://www.scopus.com/inward/record.url?scp=21944439686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=21944439686&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:21944439686

VL - 1

SP - 343

EP - 373

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 4

ER -