Fast vertical mining using diffsets

Mohammed J. Zaki, Karam Gouda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

343 Citations (Scopus)

Abstract

A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages326-335
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2003
Externally publishedYes
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: 24 Aug 200327 Aug 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period24/8/0327/8/03

Fingerprint

Data storage equipment
Scalability

Keywords

  • Association rule mining
  • Diffsets
  • Frequent itemsets

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zaki, M. J., & Gouda, K. (2003). Fast vertical mining using diffsets. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 326-335) https://doi.org/10.1145/956750.956788

Fast vertical mining using diffsets. / Zaki, Mohammed J.; Gouda, Karam.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 326-335.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaki, MJ & Gouda, K 2003, Fast vertical mining using diffsets. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 326-335, 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States, 24/8/03. https://doi.org/10.1145/956750.956788
Zaki MJ, Gouda K. Fast vertical mining using diffsets. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 326-335 https://doi.org/10.1145/956750.956788
Zaki, Mohammed J. ; Gouda, Karam. / Fast vertical mining using diffsets. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. pp. 326-335
@inproceedings{55ee5bbd4e5e4ffea5e34492a897b8f2,
title = "Fast vertical mining using diffsets",
abstract = "A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.",
keywords = "Association rule mining, Diffsets, Frequent itemsets",
author = "Zaki, {Mohammed J.} and Karam Gouda",
year = "2003",
month = "12",
day = "1",
doi = "10.1145/956750.956788",
language = "English",
pages = "326--335",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Fast vertical mining using diffsets

AU - Zaki, Mohammed J.

AU - Gouda, Karam

PY - 2003/12/1

Y1 - 2003/12/1

N2 - A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

AB - A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

KW - Association rule mining

KW - Diffsets

KW - Frequent itemsets

UR - http://www.scopus.com/inward/record.url?scp=6344277753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=6344277753&partnerID=8YFLogxK

U2 - 10.1145/956750.956788

DO - 10.1145/956750.956788

M3 - Conference contribution

AN - SCOPUS:6344277753

SP - 326

EP - 335

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -