Feasible Itemset Distributions in Data Mining

Theory and Application

Ganesh Ramesh, William A. Maniatty, Mohammed J. Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

Computing frequent itemsets and maximally frequent itemsets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
Pages284-295
Number of pages12
Volume22
Publication statusPublished - 1 Dec 2003
Externally publishedYes
EventTwenty second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003 - San Diego, CA, United States
Duration: 9 Jun 200311 Jun 2003

Other

OtherTwenty second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003
CountryUnited States
CitySan Diego, CA
Period9/6/0311/6/03

Fingerprint

Data mining
Benchmarking

ASJC Scopus subject areas

  • Software

Cite this

Ramesh, G., Maniatty, W. A., & Zaki, M. J. (2003). Feasible Itemset Distributions in Data Mining: Theory and Application. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Vol. 22, pp. 284-295)

Feasible Itemset Distributions in Data Mining : Theory and Application. / Ramesh, Ganesh; Maniatty, William A.; Zaki, Mohammed J.

Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Vol. 22 2003. p. 284-295.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ramesh, G, Maniatty, WA & Zaki, MJ 2003, Feasible Itemset Distributions in Data Mining: Theory and Application. in Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. vol. 22, pp. 284-295, Twenty second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2003, San Diego, CA, United States, 9/6/03.
Ramesh G, Maniatty WA, Zaki MJ. Feasible Itemset Distributions in Data Mining: Theory and Application. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Vol. 22. 2003. p. 284-295
Ramesh, Ganesh ; Maniatty, William A. ; Zaki, Mohammed J. / Feasible Itemset Distributions in Data Mining : Theory and Application. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Vol. 22 2003. pp. 284-295
@inproceedings{958602b44bb944189eaf15e570eb0b95,
title = "Feasible Itemset Distributions in Data Mining: Theory and Application",
abstract = "Computing frequent itemsets and maximally frequent itemsets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.",
author = "Ganesh Ramesh and Maniatty, {William A.} and Zaki, {Mohammed J.}",
year = "2003",
month = "12",
day = "1",
language = "English",
volume = "22",
pages = "284--295",
booktitle = "Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems",

}

TY - GEN

T1 - Feasible Itemset Distributions in Data Mining

T2 - Theory and Application

AU - Ramesh, Ganesh

AU - Maniatty, William A.

AU - Zaki, Mohammed J.

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Computing frequent itemsets and maximally frequent itemsets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.

AB - Computing frequent itemsets and maximally frequent itemsets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.

UR - http://www.scopus.com/inward/record.url?scp=1142299753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1142299753&partnerID=8YFLogxK

M3 - Conference contribution

VL - 22

SP - 284

EP - 295

BT - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

ER -