Selecting the right objective measure for association analysis

Pang Ning Tan, Vipin Kumar, Jaideep Srivastava

Research output: Contribution to journalArticle

339 Citations (Scopus)

Abstract

Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the right measure for a given application. A comparative study of these properties is made using twenty-one measures that were originally developed in diverse fields such as statistics, social science, machine learning, and data mining. We show that depending on its properties, each measure is useful for some application, but not for others. We also demonstrate two scenarios in which many existing measures become consistent with each other, namely, when support-based pruning and a technique known as table standardization are applied. Finally, we present an algorithm for selecting a small set of patterns such that domain experts can find a measure that best fits their requirements by ranking this small set of patterns.

Original languageEnglish
Pages (from-to)293-313
Number of pages21
JournalInformation Systems
Volume29
Issue number4
DOIs
Publication statusPublished - Jun 2004
Externally publishedYes

Fingerprint

Data mining
Social sciences
Standardization
Learning systems
Entropy
Statistics
Pruning
Confidence
Factors
Ranking
Scenarios
Comparative study
Machine learning

ASJC Scopus subject areas

  • Management Information Systems
  • Management of Technology and Innovation
  • Hardware and Architecture
  • Information Systems
  • Software

Cite this

Selecting the right objective measure for association analysis. / Tan, Pang Ning; Kumar, Vipin; Srivastava, Jaideep.

In: Information Systems, Vol. 29, No. 4, 06.2004, p. 293-313.

Research output: Contribution to journalArticle

Tan, Pang Ning ; Kumar, Vipin ; Srivastava, Jaideep. / Selecting the right objective measure for association analysis. In: Information Systems. 2004 ; Vol. 29, No. 4. pp. 293-313.
@article{3f0050b0659f4d81a5cb7360ae3dae10,
title = "Selecting the right objective measure for association analysis",
abstract = "Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the right measure for a given application. A comparative study of these properties is made using twenty-one measures that were originally developed in diverse fields such as statistics, social science, machine learning, and data mining. We show that depending on its properties, each measure is useful for some application, but not for others. We also demonstrate two scenarios in which many existing measures become consistent with each other, namely, when support-based pruning and a technique known as table standardization are applied. Finally, we present an algorithm for selecting a small set of patterns such that domain experts can find a measure that best fits their requirements by ranking this small set of patterns.",
author = "Tan, {Pang Ning} and Vipin Kumar and Jaideep Srivastava",
year = "2004",
month = "6",
doi = "10.1016/S0306-4379(03)00072-3",
language = "English",
volume = "29",
pages = "293--313",
journal = "Information Systems",
issn = "0306-4379",
publisher = "Elsevier Limited",
number = "4",

}

TY - JOUR

T1 - Selecting the right objective measure for association analysis

AU - Tan, Pang Ning

AU - Kumar, Vipin

AU - Srivastava, Jaideep

PY - 2004/6

Y1 - 2004/6

N2 - Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the right measure for a given application. A comparative study of these properties is made using twenty-one measures that were originally developed in diverse fields such as statistics, social science, machine learning, and data mining. We show that depending on its properties, each measure is useful for some application, but not for others. We also demonstrate two scenarios in which many existing measures become consistent with each other, namely, when support-based pruning and a technique known as table standardization are applied. Finally, we present an algorithm for selecting a small set of patterns such that domain experts can find a measure that best fits their requirements by ranking this small set of patterns.

AB - Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the right measure for a given application. A comparative study of these properties is made using twenty-one measures that were originally developed in diverse fields such as statistics, social science, machine learning, and data mining. We show that depending on its properties, each measure is useful for some application, but not for others. We also demonstrate two scenarios in which many existing measures become consistent with each other, namely, when support-based pruning and a technique known as table standardization are applied. Finally, we present an algorithm for selecting a small set of patterns such that domain experts can find a measure that best fits their requirements by ranking this small set of patterns.

UR - http://www.scopus.com/inward/record.url?scp=1242308945&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1242308945&partnerID=8YFLogxK

U2 - 10.1016/S0306-4379(03)00072-3

DO - 10.1016/S0306-4379(03)00072-3

M3 - Article

AN - SCOPUS:1242308945

VL - 29

SP - 293

EP - 313

JO - Information Systems

JF - Information Systems

SN - 0306-4379

IS - 4

ER -