Localized algorithm for parallel association mining

Mohammed Javeed Zaki, Srinivasan Parthasarathy, Wei Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Citations (Scopus)

Abstract

Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.

Original languageEnglish
Title of host publicationAnnual ACM Symposium on Parallel Algorithms and Architectures
Editors Anon
Place of PublicationNew York, NY, United States
PublisherACM
Pages321-330
Number of pages10
Publication statusPublished - 1 Jan 1997
Externally publishedYes
EventProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA - Newport, RI, USA
Duration: 22 Jun 199725 Jun 1997

Other

OtherProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA
CityNewport, RI, USA
Period22/6/9725/6/97

Fingerprint

Association reactions
Association rules
Synchronization
Testbeds
Parallel algorithms
Data storage equipment
Communication
Costs

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Cite this

Zaki, M. J., Parthasarathy, S., & Li, W. (1997). Localized algorithm for parallel association mining. In Anon (Ed.), Annual ACM Symposium on Parallel Algorithms and Architectures (pp. 321-330). New York, NY, United States: ACM.

Localized algorithm for parallel association mining. / Zaki, Mohammed Javeed; Parthasarathy, Srinivasan; Li, Wei.

Annual ACM Symposium on Parallel Algorithms and Architectures. ed. / Anon. New York, NY, United States : ACM, 1997. p. 321-330.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaki, MJ, Parthasarathy, S & Li, W 1997, Localized algorithm for parallel association mining. in Anon (ed.), Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, NY, United States, pp. 321-330, Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, 22/6/97.
Zaki MJ, Parthasarathy S, Li W. Localized algorithm for parallel association mining. In Anon, editor, Annual ACM Symposium on Parallel Algorithms and Architectures. New York, NY, United States: ACM. 1997. p. 321-330
Zaki, Mohammed Javeed ; Parthasarathy, Srinivasan ; Li, Wei. / Localized algorithm for parallel association mining. Annual ACM Symposium on Parallel Algorithms and Architectures. editor / Anon. New York, NY, United States : ACM, 1997. pp. 321-330
@inproceedings{9b9abd192446407293e2b8cfdd2cb063,
title = "Localized algorithm for parallel association mining",
abstract = "Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.",
author = "Zaki, {Mohammed Javeed} and Srinivasan Parthasarathy and Wei Li",
year = "1997",
month = "1",
day = "1",
language = "English",
pages = "321--330",
editor = "Anon",
booktitle = "Annual ACM Symposium on Parallel Algorithms and Architectures",
publisher = "ACM",

}

TY - GEN

T1 - Localized algorithm for parallel association mining

AU - Zaki, Mohammed Javeed

AU - Parthasarathy, Srinivasan

AU - Li, Wei

PY - 1997/1/1

Y1 - 1997/1/1

N2 - Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.

AB - Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.

UR - http://www.scopus.com/inward/record.url?scp=0030686158&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030686158&partnerID=8YFLogxK

M3 - Conference contribution

SP - 321

EP - 330

BT - Annual ACM Symposium on Parallel Algorithms and Architectures

A2 - Anon, null

PB - ACM

CY - New York, NY, United States

ER -