Sampling frequent and minimal boolean patterns

theory and application in classification

Geng Li, Mohammed J. Zaki

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns in disjunctive normal form (DNF). We propose a novel theoretical characterization of the minimal DNF expressions, which allows us to prune the pattern search space effectively. Our approach can provide a near-uniform sample of the minimal DNF patterns. We perform an extensive set of experiments to demonstrate the effectiveness of our sampling method. We also show that minimal DNF patterns make effective features for classification.

Original languageEnglish
Pages (from-to)181-225
Number of pages45
JournalData Mining and Knowledge Discovery
Volume30
Issue number1
DOIs
Publication statusPublished - 1 Jan 2016

Fingerprint

Sampling
Set theory
Experiments

Keywords

  • Classification
  • Disjunctive patterns
  • Frequent pattern mining
  • Markov chain monte carlo
  • Minimal boolean expressions
  • Minimal generators
  • Pattern sampling

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Sampling frequent and minimal boolean patterns : theory and application in classification. / Li, Geng; Zaki, Mohammed J.

In: Data Mining and Knowledge Discovery, Vol. 30, No. 1, 01.01.2016, p. 181-225.

Research output: Contribution to journalArticle

@article{9a6db01cd6644fa28487b0a7efc175d0,
title = "Sampling frequent and minimal boolean patterns: theory and application in classification",
abstract = "We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns in disjunctive normal form (DNF). We propose a novel theoretical characterization of the minimal DNF expressions, which allows us to prune the pattern search space effectively. Our approach can provide a near-uniform sample of the minimal DNF patterns. We perform an extensive set of experiments to demonstrate the effectiveness of our sampling method. We also show that minimal DNF patterns make effective features for classification.",
keywords = "Classification, Disjunctive patterns, Frequent pattern mining, Markov chain monte carlo, Minimal boolean expressions, Minimal generators, Pattern sampling",
author = "Geng Li and Zaki, {Mohammed J.}",
year = "2016",
month = "1",
day = "1",
doi = "10.1007/s10618-015-0409-y",
language = "English",
volume = "30",
pages = "181--225",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Sampling frequent and minimal boolean patterns

T2 - theory and application in classification

AU - Li, Geng

AU - Zaki, Mohammed J.

PY - 2016/1/1

Y1 - 2016/1/1

N2 - We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns in disjunctive normal form (DNF). We propose a novel theoretical characterization of the minimal DNF expressions, which allows us to prune the pattern search space effectively. Our approach can provide a near-uniform sample of the minimal DNF patterns. We perform an extensive set of experiments to demonstrate the effectiveness of our sampling method. We also show that minimal DNF patterns make effective features for classification.

AB - We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns in disjunctive normal form (DNF). We propose a novel theoretical characterization of the minimal DNF expressions, which allows us to prune the pattern search space effectively. Our approach can provide a near-uniform sample of the minimal DNF patterns. We perform an extensive set of experiments to demonstrate the effectiveness of our sampling method. We also show that minimal DNF patterns make effective features for classification.

KW - Classification

KW - Disjunctive patterns

KW - Frequent pattern mining

KW - Markov chain monte carlo

KW - Minimal boolean expressions

KW - Minimal generators

KW - Pattern sampling

UR - http://www.scopus.com/inward/record.url?scp=84953839895&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84953839895&partnerID=8YFLogxK

U2 - 10.1007/s10618-015-0409-y

DO - 10.1007/s10618-015-0409-y

M3 - Article

VL - 30

SP - 181

EP - 225

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 1

ER -