Reasoning about sets using redescription mining

Mohammed J. Zaki, Naren Ramakrishnan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

43 Citations (Scopus)

Abstract

Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsR.L. Grossman, R. Bayardo, K. Bennett, J. Vaidya
Pages364-373
Number of pages10
DOIs
Publication statusPublished - 1 Dec 2005
Externally publishedYes
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: 21 Aug 200524 Aug 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityChicago, IL
Period21/8/0524/8/05

Fingerprint

Association rules
Bioinformatics
Data mining
Genes

Keywords

  • Closed itemsets
  • Data mining
  • Minimal generators
  • Redescription

ASJC Scopus subject areas

  • Information Systems

Cite this

Zaki, M. J., & Ramakrishnan, N. (2005). Reasoning about sets using redescription mining. In R. L. Grossman, R. Bayardo, K. Bennett, & J. Vaidya (Eds.), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 364-373) https://doi.org/10.1145/1081870.1081912

Reasoning about sets using redescription mining. / Zaki, Mohammed J.; Ramakrishnan, Naren.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / R.L. Grossman; R. Bayardo; K. Bennett; J. Vaidya. 2005. p. 364-373.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zaki, MJ & Ramakrishnan, N 2005, Reasoning about sets using redescription mining. in RL Grossman, R Bayardo, K Bennett & J Vaidya (eds), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 364-373, KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States, 21/8/05. https://doi.org/10.1145/1081870.1081912
Zaki MJ, Ramakrishnan N. Reasoning about sets using redescription mining. In Grossman RL, Bayardo R, Bennett K, Vaidya J, editors, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005. p. 364-373 https://doi.org/10.1145/1081870.1081912
Zaki, Mohammed J. ; Ramakrishnan, Naren. / Reasoning about sets using redescription mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / R.L. Grossman ; R. Bayardo ; K. Bennett ; J. Vaidya. 2005. pp. 364-373
@inproceedings{38b03f1dc3df488ca9839246cc8fd087,
title = "Reasoning about sets using redescription mining",
abstract = "Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.",
keywords = "Closed itemsets, Data mining, Minimal generators, Redescription",
author = "Zaki, {Mohammed J.} and Naren Ramakrishnan",
year = "2005",
month = "12",
day = "1",
doi = "10.1145/1081870.1081912",
language = "English",
pages = "364--373",
editor = "R.L. Grossman and R. Bayardo and K. Bennett and J. Vaidya",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Reasoning about sets using redescription mining

AU - Zaki, Mohammed J.

AU - Ramakrishnan, Naren

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.

AB - Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.

KW - Closed itemsets

KW - Data mining

KW - Minimal generators

KW - Redescription

UR - http://www.scopus.com/inward/record.url?scp=32344440051&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344440051&partnerID=8YFLogxK

U2 - 10.1145/1081870.1081912

DO - 10.1145/1081870.1081912

M3 - Conference contribution

SP - 364

EP - 373

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Grossman, R.L.

A2 - Bayardo, R.

A2 - Bennett, K.

A2 - Vaidya, J.

ER -