Selecting the right interestingness measure for association patterns

Pang Ning Tan, Vipin Kumar, Jaideep Srivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

578 Citations (Scopus)

Abstract

Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsD. Hand, D. Keim, R. Ng
Pages32-41
Number of pages10
Publication statusPublished - 2002
Externally publishedYes
EventKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Edmonton, Alta, Canada
Duration: 23 Jul 200226 Jul 2002

Other

OtherKDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryCanada
CityEdmonton, Alta
Period23/7/0226/7/02

Fingerprint

Association rules
Standardization
Data mining
Learning systems
Feature extraction
Statistics

Keywords

  • Associations
  • Contingency tables
  • Interestingness measure

ASJC Scopus subject areas

  • Information Systems

Cite this

Tan, P. N., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In D. Hand, D. Keim, & R. Ng (Eds.), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 32-41)

Selecting the right interestingness measure for association patterns. / Tan, Pang Ning; Kumar, Vipin; Srivastava, Jaideep.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / D. Hand; D. Keim; R. Ng. 2002. p. 32-41.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tan, PN, Kumar, V & Srivastava, J 2002, Selecting the right interestingness measure for association patterns. in D Hand, D Keim & R Ng (eds), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 32-41, KDD - 2002 Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alta, Canada, 23/7/02.
Tan PN, Kumar V, Srivastava J. Selecting the right interestingness measure for association patterns. In Hand D, Keim D, Ng R, editors, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2002. p. 32-41
Tan, Pang Ning ; Kumar, Vipin ; Srivastava, Jaideep. / Selecting the right interestingness measure for association patterns. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / D. Hand ; D. Keim ; R. Ng. 2002. pp. 32-41
@inproceedings{78c6a3a8159348f3a499e03a981d4316,
title = "Selecting the right interestingness measure for association patterns",
abstract = "Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.",
keywords = "Associations, Contingency tables, Interestingness measure",
author = "Tan, {Pang Ning} and Vipin Kumar and Jaideep Srivastava",
year = "2002",
language = "English",
pages = "32--41",
editor = "D. Hand and D. Keim and R. Ng",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Selecting the right interestingness measure for association patterns

AU - Tan, Pang Ning

AU - Kumar, Vipin

AU - Srivastava, Jaideep

PY - 2002

Y1 - 2002

N2 - Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.

AB - Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestingness of association patterns. However, many such measures provide conflicting information about the interestingness of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.

KW - Associations

KW - Contingency tables

KW - Interestingness measure

UR - http://www.scopus.com/inward/record.url?scp=0242625291&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0242625291&partnerID=8YFLogxK

M3 - Conference contribution

SP - 32

EP - 41

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Hand, D.

A2 - Keim, D.

A2 - Ng, R.

ER -