Distribution-based synthetic database generation techniques for itemset mining

Ganesh Ramesh, Mohammed J. Zaki, William A. Maniatty

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

The resource requirements of frequent pattern mining algorithms depend mainly on the length distribution of the mined patterns in the database. Synthetic databases, which are used to benchmark performance of algorithms, tend to have distributions far different from those observed in real datasets. In this paper we focus on the problem of synthetic database generation and propose algorithms to effectively embed within the database, any given set of maximal pattern collections, and make the following contributions: 1. A database generation technique is presented which takes k maximal itemset collections as input, and constructs a database which produces these maximal collections as output, when mined at k levels of support. To analyze the efficiency of the procedure, upper bounds are provided on the number of transactions output in the generated database. 2. A compression method is used and extended to reduce the size of the output database. An optimization to the generation procedure is provided which could potentially reduce the number of transactions generated. 3. Preliminary experimental results are presented to demonstrate the feasibility of using the generation technique.

Original languageEnglish
Title of host publicationProceedings of the International Database Engineering and Applications Symposium, IDEAS
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages307-316
Number of pages10
Volume2005-January
EditionJanuary
DOIs
Publication statusPublished - 2005
Externally publishedYes
Event9th International Database Engineering and Application Symposium, IDEAS 2005 - Montreal, Canada
Duration: 25 Jul 200527 Jul 2005

Other

Other9th International Database Engineering and Application Symposium, IDEAS 2005
CountryCanada
CityMontreal
Period25/7/0527/7/05

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Ramesh, G., Zaki, M. J., & Maniatty, W. A. (2005). Distribution-based synthetic database generation techniques for itemset mining. In Proceedings of the International Database Engineering and Applications Symposium, IDEAS (January ed., Vol. 2005-January, pp. 307-316). [1540921] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IDEAS.2005.22

Distribution-based synthetic database generation techniques for itemset mining. / Ramesh, Ganesh; Zaki, Mohammed J.; Maniatty, William A.

Proceedings of the International Database Engineering and Applications Symposium, IDEAS. Vol. 2005-January January. ed. Institute of Electrical and Electronics Engineers Inc., 2005. p. 307-316 1540921.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ramesh, G, Zaki, MJ & Maniatty, WA 2005, Distribution-based synthetic database generation techniques for itemset mining. in Proceedings of the International Database Engineering and Applications Symposium, IDEAS. January edn, vol. 2005-January, 1540921, Institute of Electrical and Electronics Engineers Inc., pp. 307-316, 9th International Database Engineering and Application Symposium, IDEAS 2005, Montreal, Canada, 25/7/05. https://doi.org/10.1109/IDEAS.2005.22
Ramesh G, Zaki MJ, Maniatty WA. Distribution-based synthetic database generation techniques for itemset mining. In Proceedings of the International Database Engineering and Applications Symposium, IDEAS. January ed. Vol. 2005-January. Institute of Electrical and Electronics Engineers Inc. 2005. p. 307-316. 1540921 https://doi.org/10.1109/IDEAS.2005.22
Ramesh, Ganesh ; Zaki, Mohammed J. ; Maniatty, William A. / Distribution-based synthetic database generation techniques for itemset mining. Proceedings of the International Database Engineering and Applications Symposium, IDEAS. Vol. 2005-January January. ed. Institute of Electrical and Electronics Engineers Inc., 2005. pp. 307-316
@inproceedings{40befdc60ed94f55b82acad0eaf3773b,
title = "Distribution-based synthetic database generation techniques for itemset mining",
abstract = "The resource requirements of frequent pattern mining algorithms depend mainly on the length distribution of the mined patterns in the database. Synthetic databases, which are used to benchmark performance of algorithms, tend to have distributions far different from those observed in real datasets. In this paper we focus on the problem of synthetic database generation and propose algorithms to effectively embed within the database, any given set of maximal pattern collections, and make the following contributions: 1. A database generation technique is presented which takes k maximal itemset collections as input, and constructs a database which produces these maximal collections as output, when mined at k levels of support. To analyze the efficiency of the procedure, upper bounds are provided on the number of transactions output in the generated database. 2. A compression method is used and extended to reduce the size of the output database. An optimization to the generation procedure is provided which could potentially reduce the number of transactions generated. 3. Preliminary experimental results are presented to demonstrate the feasibility of using the generation technique.",
author = "Ganesh Ramesh and Zaki, {Mohammed J.} and Maniatty, {William A.}",
year = "2005",
doi = "10.1109/IDEAS.2005.22",
language = "English",
volume = "2005-January",
pages = "307--316",
booktitle = "Proceedings of the International Database Engineering and Applications Symposium, IDEAS",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
edition = "January",

}

TY - GEN

T1 - Distribution-based synthetic database generation techniques for itemset mining

AU - Ramesh, Ganesh

AU - Zaki, Mohammed J.

AU - Maniatty, William A.

PY - 2005

Y1 - 2005

N2 - The resource requirements of frequent pattern mining algorithms depend mainly on the length distribution of the mined patterns in the database. Synthetic databases, which are used to benchmark performance of algorithms, tend to have distributions far different from those observed in real datasets. In this paper we focus on the problem of synthetic database generation and propose algorithms to effectively embed within the database, any given set of maximal pattern collections, and make the following contributions: 1. A database generation technique is presented which takes k maximal itemset collections as input, and constructs a database which produces these maximal collections as output, when mined at k levels of support. To analyze the efficiency of the procedure, upper bounds are provided on the number of transactions output in the generated database. 2. A compression method is used and extended to reduce the size of the output database. An optimization to the generation procedure is provided which could potentially reduce the number of transactions generated. 3. Preliminary experimental results are presented to demonstrate the feasibility of using the generation technique.

AB - The resource requirements of frequent pattern mining algorithms depend mainly on the length distribution of the mined patterns in the database. Synthetic databases, which are used to benchmark performance of algorithms, tend to have distributions far different from those observed in real datasets. In this paper we focus on the problem of synthetic database generation and propose algorithms to effectively embed within the database, any given set of maximal pattern collections, and make the following contributions: 1. A database generation technique is presented which takes k maximal itemset collections as input, and constructs a database which produces these maximal collections as output, when mined at k levels of support. To analyze the efficiency of the procedure, upper bounds are provided on the number of transactions output in the generated database. 2. A compression method is used and extended to reduce the size of the output database. An optimization to the generation procedure is provided which could potentially reduce the number of transactions generated. 3. Preliminary experimental results are presented to demonstrate the feasibility of using the generation technique.

UR - http://www.scopus.com/inward/record.url?scp=74849096429&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74849096429&partnerID=8YFLogxK

U2 - 10.1109/IDEAS.2005.22

DO - 10.1109/IDEAS.2005.22

M3 - Conference contribution

VL - 2005-January

SP - 307

EP - 316

BT - Proceedings of the International Database Engineering and Applications Symposium, IDEAS

PB - Institute of Electrical and Electronics Engineers Inc.

ER -