Fast vertical mining using diffsets

Mohammed J. Zaki, Karam Gouda

Research output: Contribution to conferencePaper

363 Citations (Scopus)

Abstract

A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.

Original languageEnglish
Pages326-335
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2003
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: 24 Aug 200327 Aug 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period24/8/0327/8/03

    Fingerprint

Keywords

  • Association rule mining
  • Diffsets
  • Frequent itemsets

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zaki, M. J., & Gouda, K. (2003). Fast vertical mining using diffsets. 326-335. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States. https://doi.org/10.1145/956750.956788