Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets

Florian Verhein, Sanjay Chawla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Citations (Scopus)

Abstract

The application of association rule mining to classification has led to a new family of classifiers which are often referred to as "Associative Classifiers (ACs)". An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where "imbalanced data sets" are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPAR-CCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Data Mining, ICDM
Pages679-684
Number of pages6
DOIs
Publication statusPublished - 2007
Externally publishedYes
Event7th IEEE International Conference on Data Mining, ICDM 2007 - Omaha, NE, United States
Duration: 28 Oct 200731 Oct 2007

Other

Other7th IEEE International Conference on Data Mining, ICDM 2007
CountryUnited States
CityOmaha, NE
Period28/10/0731/10/07

Fingerprint

Classifiers
Association rules
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. / Verhein, Florian; Chawla, Sanjay.

Proceedings - IEEE International Conference on Data Mining, ICDM. 2007. p. 679-684 4470310.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Verhein, F & Chawla, S 2007, Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. in Proceedings - IEEE International Conference on Data Mining, ICDM., 4470310, pp. 679-684, 7th IEEE International Conference on Data Mining, ICDM 2007, Omaha, NE, United States, 28/10/07. https://doi.org/10.1109/ICDM.2007.63
Verhein, Florian ; Chawla, Sanjay. / Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. Proceedings - IEEE International Conference on Data Mining, ICDM. 2007. pp. 679-684
@inproceedings{433319ac1e84429d999b328b867e8979,
title = "Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets",
abstract = "The application of association rule mining to classification has led to a new family of classifiers which are often referred to as {"}Associative Classifiers (ACs){"}. An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where {"}imbalanced data sets{"} are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPAR-CCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.",
author = "Florian Verhein and Sanjay Chawla",
year = "2007",
doi = "10.1109/ICDM.2007.63",
language = "English",
isbn = "0769530184",
pages = "679--684",
booktitle = "Proceedings - IEEE International Conference on Data Mining, ICDM",

}

TY - GEN

T1 - Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets

AU - Verhein, Florian

AU - Chawla, Sanjay

PY - 2007

Y1 - 2007

N2 - The application of association rule mining to classification has led to a new family of classifiers which are often referred to as "Associative Classifiers (ACs)". An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where "imbalanced data sets" are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPAR-CCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.

AB - The application of association rule mining to classification has led to a new family of classifiers which are often referred to as "Associative Classifiers (ACs)". An advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Rule-based classifiers can play a very important role in applications such as medical diagnosis and fraud detection where "imbalanced data sets" are the norm and not the exception. The focus of this paper is to extend and modify ACs for classification on imbalanced data sets using only statistical techniques. We combine the use of statistically significant rules with a new measure, the Class Correlation Ratio (CCR), to build an AC which we call SPARCCC. Experiments show that in terms of classification quality, SPAR-CCC performs comparably on balanced datasets and outperforms other AC techniques on imbalanced data sets. It also has a significantly smaller rule base and is much more computationally efficient.

UR - http://www.scopus.com/inward/record.url?scp=49749113225&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=49749113225&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2007.63

DO - 10.1109/ICDM.2007.63

M3 - Conference contribution

SN - 0769530184

SN - 9780769530185

SP - 679

EP - 684

BT - Proceedings - IEEE International Conference on Data Mining, ICDM

ER -