Quality-aware association rule mining

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called "interesting" rule noted LHS -> RHS is meaningful when 30 % of LHS data are not up-to-date anymore, 20% of RHS data are not accurate, and 15% of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages440-449
Number of pages10
Volume3918 LNAI
Publication statusPublished - 14 Jul 2006
Externally publishedYes
Event10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2006 - Singapore, Singapore
Duration: 9 Apr 200612 Apr 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3918 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2006
CountrySingapore
CitySingapore
Period9/4/0612/4/06

Fingerprint

Association Rule Mining
Association rules
Costs and Cost Analysis
Association Rules
Information Storage and Retrieval
Statistical Models
Data Quality
Quality Measures
Credibility
Costs
Probabilistic Model
Confidence
Integrate
Data Accuracy
Datasets
Experiments
Experiment

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Berti-Equille, L. (2006). Quality-aware association rule mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3918 LNAI, pp. 440-449). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3918 LNAI).

Quality-aware association rule mining. / Berti-Equille, Laure.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3918 LNAI 2006. p. 440-449 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3918 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Berti-Equille, L 2006, Quality-aware association rule mining. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3918 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3918 LNAI, pp. 440-449, 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2006, Singapore, Singapore, 9/4/06.
Berti-Equille L. Quality-aware association rule mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3918 LNAI. 2006. p. 440-449. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Berti-Equille, Laure. / Quality-aware association rule mining. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3918 LNAI 2006. pp. 440-449 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ecac45cf3d32439ea976eb022ba9b91a,
title = "Quality-aware association rule mining",
abstract = "The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called {"}interesting{"} rule noted LHS -> RHS is meaningful when 30 {\%} of LHS data are not up-to-date anymore, 20{\%} of RHS data are not accurate, and 15{\%} of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.",
author = "Laure Berti-Equille",
year = "2006",
month = "7",
day = "14",
language = "English",
isbn = "3540332065",
volume = "3918 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "440--449",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Quality-aware association rule mining

AU - Berti-Equille, Laure

PY - 2006/7/14

Y1 - 2006/7/14

N2 - The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called "interesting" rule noted LHS -> RHS is meaningful when 30 % of LHS data are not up-to-date anymore, 20% of RHS data are not accurate, and 15% of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.

AB - The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying subsidies to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder whether a so-called "interesting" rule noted LHS -> RHS is meaningful when 30 % of LHS data are not up-to-date anymore, 20% of RHS data are not accurate, and 15% of LHS data come from a data source that is well-known for its bad credibility. In this paper we propose to integrate data quality measures for effective and quality-aware association rule mining and we propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-CUP-98 datasets show for different variations of data quality indicators the corresponding cost and quality of discovered association rules that can be legitimately (or not) selected.

UR - http://www.scopus.com/inward/record.url?scp=33745801621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745801621&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33745801621

SN - 3540332065

SN - 9783540332060

VL - 3918 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 440

EP - 449

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -