### Abstract

Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.

Original language | English |
---|---|

Title of host publication | Annual ACM Symposium on Parallel Algorithms and Architectures |

Editors | Anon |

Place of Publication | New York, NY, United States |

Publisher | ACM |

Pages | 321-330 |

Number of pages | 10 |

Publication status | Published - 1 Jan 1997 |

Externally published | Yes |

Event | Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA - Newport, RI, USA Duration: 22 Jun 1997 → 25 Jun 1997 |

### Other

Other | Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA |
---|---|

City | Newport, RI, USA |

Period | 22/6/97 → 25/6/97 |

### Fingerprint

### ASJC Scopus subject areas

- Software
- Safety, Risk, Reliability and Quality

### Cite this

*Annual ACM Symposium on Parallel Algorithms and Architectures*(pp. 321-330). New York, NY, United States: ACM.

**Localized algorithm for parallel association mining.** / Zaki, Mohammed Javeed; Parthasarathy, Srinivasan; Li, Wei.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Annual ACM Symposium on Parallel Algorithms and Architectures.*ACM, New York, NY, United States, pp. 321-330, Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, 22/6/97.

}

TY - GEN

T1 - Localized algorithm for parallel association mining

AU - Zaki, Mohammed Javeed

AU - Parthasarathy, Srinivasan

AU - Li, Wei

PY - 1997/1/1

Y1 - 1997/1/1

N2 - Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.

AB - Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.

UR - http://www.scopus.com/inward/record.url?scp=0030686158&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030686158&partnerID=8YFLogxK

M3 - Conference contribution

SP - 321

EP - 330

BT - Annual ACM Symposium on Parallel Algorithms and Architectures

A2 - Anon, null

PB - ACM

CY - New York, NY, United States

ER -