High-confidence rule mining for microarray analysis

Tara Mcintosh, Sanjay Chawla

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MAXCONF, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MAXCONF outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach-the rules discovered by MAXCONF are substantially more interesting and meaningful compared with support-based methods.

Original languageEnglish
Pages (from-to)611-623
Number of pages13
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume4
Issue number4
DOIs
Publication statusPublished - Oct 2007
Externally publishedYes

Fingerprint

Microarray Analysis
Microarrays
Confidence
Mining
Microarray Data
Genes
Pruning
Association rules
pruning
Enumeration
Search Space
Data Mining
Data mining
Scalability
Gene
Confidence Measure
Experiments
prunes
Rule Extraction
Sparse Data

Keywords

  • Association rules
  • Data mining
  • High-confidence rule mining
  • Microarray analysis

ASJC Scopus subject areas

  • Engineering(all)
  • Agricultural and Biological Sciences (miscellaneous)

Cite this

High-confidence rule mining for microarray analysis. / Mcintosh, Tara; Chawla, Sanjay.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 4, No. 4, 10.2007, p. 611-623.

Research output: Contribution to journalArticle

@article{74cfbaf539314111b47d11a4722443b1,
title = "High-confidence rule mining for microarray analysis",
abstract = "We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MAXCONF, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MAXCONF outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach-the rules discovered by MAXCONF are substantially more interesting and meaningful compared with support-based methods.",
keywords = "Association rules, Data mining, High-confidence rule mining, Microarray analysis",
author = "Tara Mcintosh and Sanjay Chawla",
year = "2007",
month = "10",
doi = "10.1109/tcbb.2007.1050",
language = "English",
volume = "4",
pages = "611--623",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - High-confidence rule mining for microarray analysis

AU - Mcintosh, Tara

AU - Chawla, Sanjay

PY - 2007/10

Y1 - 2007/10

N2 - We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MAXCONF, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MAXCONF outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach-the rules discovered by MAXCONF are substantially more interesting and meaningful compared with support-based methods.

AB - We present an association rule mining method for mining high-confidence rules, which describe interesting gene relationships from microarray data sets. Microarray data sets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimized for sparse data sets. A new family of row-enumeration rule mining algorithms has emerged to facilitate mining in dense data sets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MAXCONF, to mine high-confidence rules from microarray data. MAXCONF is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on three microarray data sets show that MAXCONF outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach-the rules discovered by MAXCONF are substantially more interesting and meaningful compared with support-based methods.

KW - Association rules

KW - Data mining

KW - High-confidence rule mining

KW - Microarray analysis

UR - http://www.scopus.com/inward/record.url?scp=36249019195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36249019195&partnerID=8YFLogxK

U2 - 10.1109/tcbb.2007.1050

DO - 10.1109/tcbb.2007.1050

M3 - Article

C2 - 17975272

AN - SCOPUS:36249019195

VL - 4

SP - 611

EP - 623

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 4

ER -