Evaluation of sampling for data mining of association rules

Mohammed Javeed Zaki, Srinivasan Parthasarathy, Wei Li, Mitsunori Ogihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

97 Citations (Scopus)

Abstract

Discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring itemsets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. In this paper we show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. We may also be able to make the sampled database resident in main-memory. Furthermore, we show that sampling can accurately represent the data patterns in the database with high confidence. We experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, and the accuracy and confidence of the chosen sample.

Original languageEnglish
Title of host publicationProceedings of the IEEE International Workshop on Research Issues in Data Engineering
EditorsP. Scheuermann
Place of PublicationLos Alamitos, CA, United States
PublisherIEEE
Pages42-50
Number of pages9
Publication statusPublished - 1 Jan 1997
Externally publishedYes
EventProceedings of the 1997 7th International Workshop on Research Issues in Data Engineering, RIDE'97 - Birmingham, UK
Duration: 7 Apr 19978 Apr 1997

Other

OtherProceedings of the 1997 7th International Workshop on Research Issues in Data Engineering, RIDE'97
CityBirmingham, UK
Period7/4/978/4/97

Fingerprint

Association rules
Data mining
Sampling
Scanning
Data storage equipment
Costs

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Engineering (miscellaneous)

Cite this

Zaki, M. J., Parthasarathy, S., Li, W., & Ogihara, M. (1997). Evaluation of sampling for data mining of association rules. In P. Scheuermann (Ed.), Proceedings of the IEEE International Workshop on Research Issues in Data Engineering (pp. 42-50). Los Alamitos, CA, United States: IEEE.

Evaluation of sampling for data mining of association rules. / Zaki, Mohammed Javeed; Parthasarathy, Srinivasan; Li, Wei; Ogihara, Mitsunori.

Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. ed. / P. Scheuermann. Los Alamitos, CA, United States : IEEE, 1997. p. 42-50.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaki, MJ, Parthasarathy, S, Li, W & Ogihara, M 1997, Evaluation of sampling for data mining of association rules. in P Scheuermann (ed.), Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. IEEE, Los Alamitos, CA, United States, pp. 42-50, Proceedings of the 1997 7th International Workshop on Research Issues in Data Engineering, RIDE'97, Birmingham, UK, 7/4/97.
Zaki MJ, Parthasarathy S, Li W, Ogihara M. Evaluation of sampling for data mining of association rules. In Scheuermann P, editor, Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. Los Alamitos, CA, United States: IEEE. 1997. p. 42-50
Zaki, Mohammed Javeed ; Parthasarathy, Srinivasan ; Li, Wei ; Ogihara, Mitsunori. / Evaluation of sampling for data mining of association rules. Proceedings of the IEEE International Workshop on Research Issues in Data Engineering. editor / P. Scheuermann. Los Alamitos, CA, United States : IEEE, 1997. pp. 42-50
@inproceedings{6949a070d15f4b0ea118df0be1c2ffe7,
title = "Evaluation of sampling for data mining of association rules",
abstract = "Discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring itemsets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. In this paper we show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. We may also be able to make the sampled database resident in main-memory. Furthermore, we show that sampling can accurately represent the data patterns in the database with high confidence. We experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, and the accuracy and confidence of the chosen sample.",
author = "Zaki, {Mohammed Javeed} and Srinivasan Parthasarathy and Wei Li and Mitsunori Ogihara",
year = "1997",
month = "1",
day = "1",
language = "English",
pages = "42--50",
editor = "P. Scheuermann",
booktitle = "Proceedings of the IEEE International Workshop on Research Issues in Data Engineering",
publisher = "IEEE",

}

TY - GEN

T1 - Evaluation of sampling for data mining of association rules

AU - Zaki, Mohammed Javeed

AU - Parthasarathy, Srinivasan

AU - Li, Wei

AU - Ogihara, Mitsunori

PY - 1997/1/1

Y1 - 1997/1/1

N2 - Discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring itemsets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. In this paper we show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. We may also be able to make the sampled database resident in main-memory. Furthermore, we show that sampling can accurately represent the data patterns in the database with high confidence. We experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, and the accuracy and confidence of the chosen sample.

AB - Discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring itemsets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. In this paper we show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. We may also be able to make the sampled database resident in main-memory. Furthermore, we show that sampling can accurately represent the data patterns in the database with high confidence. We experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, and the accuracy and confidence of the chosen sample.

UR - http://www.scopus.com/inward/record.url?scp=0030645988&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030645988&partnerID=8YFLogxK

M3 - Conference contribution

SP - 42

EP - 50

BT - Proceedings of the IEEE International Workshop on Research Issues in Data Engineering

A2 - Scheuermann, P.

PB - IEEE

CY - Los Alamitos, CA, United States

ER -