Efficient algorithms for mining closed itemsets and their lattice structure

Mohammed J. Zaki, Ching Jui Hsiao

Research output: Contribution to journalArticle

333 Citations (Scopus)

Abstract

The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hash-based approach to remove any "nonclosed" sets found during computation. We also present CHARM-L, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a state-of-the-art algorithm that outperforms previous methods. Further, CHARM-L explicitly generates the frequent closed itemset lattice.

Original languageEnglish
Pages (from-to)462-478
Number of pages17
JournalIEEE Transactions on Knowledge and Data Engineering
Volume17
Issue number4
DOIs
Publication statusPublished - 1 Apr 2005
Externally publishedYes

Fingerprint

Visualization
Data storage equipment

Keywords

  • Association rules
  • Closed itemset lattice
  • Closed itemsets
  • Data mining
  • Frequent itemsets

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Efficient algorithms for mining closed itemsets and their lattice structure. / Zaki, Mohammed J.; Hsiao, Ching Jui.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4, 01.04.2005, p. 462-478.

Research output: Contribution to journalArticle

@article{f37d117eb0bc4e41ad9c2c8b3cef2b18,
title = "Efficient algorithms for mining closed itemsets and their lattice structure",
abstract = "The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hash-based approach to remove any {"}nonclosed{"} sets found during computation. We also present CHARM-L, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a state-of-the-art algorithm that outperforms previous methods. Further, CHARM-L explicitly generates the frequent closed itemset lattice.",
keywords = "Association rules, Closed itemset lattice, Closed itemsets, Data mining, Frequent itemsets",
author = "Zaki, {Mohammed J.} and Hsiao, {Ching Jui}",
year = "2005",
month = "4",
day = "1",
doi = "10.1109/TKDE.2005.60",
language = "English",
volume = "17",
pages = "462--478",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Efficient algorithms for mining closed itemsets and their lattice structure

AU - Zaki, Mohammed J.

AU - Hsiao, Ching Jui

PY - 2005/4/1

Y1 - 2005/4/1

N2 - The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hash-based approach to remove any "nonclosed" sets found during computation. We also present CHARM-L, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a state-of-the-art algorithm that outperforms previous methods. Further, CHARM-L explicitly generates the frequent closed itemset lattice.

AB - The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hash-based approach to remove any "nonclosed" sets found during computation. We also present CHARM-L, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a state-of-the-art algorithm that outperforms previous methods. Further, CHARM-L explicitly generates the frequent closed itemset lattice.

KW - Association rules

KW - Closed itemset lattice

KW - Closed itemsets

KW - Data mining

KW - Frequent itemsets

UR - http://www.scopus.com/inward/record.url?scp=17044438212&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17044438212&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2005.60

DO - 10.1109/TKDE.2005.60

M3 - Article

AN - SCOPUS:17044438212

VL - 17

SP - 462

EP - 478

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 4

ER -