MUSK

Uniform sampling of k maximal patterns

Mohammad Al Hasan, Mohammed Zaki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, MUSK, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. MUSK simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.

Original languageEnglish
Title of host publicationSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Pages646-657
Number of pages12
Volume2
Publication statusPublished - 31 Dec 2009
Externally publishedYes
Event9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States
Duration: 30 Apr 20092 May 2009

Other

Other9th SIAM International Conference on Data Mining 2009, SDM 2009
CountryUnited States
CitySparks, NV
Period30/4/092/5/09

Fingerprint

Frequent Pattern
Sampling
Frequent Pattern Mining
Markov processes
Partial Order
Random walk
Graph in graph theory
Transition Probability Matrix
Two-step Method
Markov Chain Monte Carlo Algorithms
Stationary Distribution
Uniformity
Experiments
Clustering
Subset
Output
Alternatives
Vertex of a graph
Experiment
Simulation

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software
  • Applied Mathematics

Cite this

Al Hasan, M., & Zaki, M. (2009). MUSK: Uniform sampling of k maximal patterns. In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics (Vol. 2, pp. 646-657)

MUSK : Uniform sampling of k maximal patterns. / Al Hasan, Mohammad; Zaki, Mohammed.

Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. Vol. 2 2009. p. 646-657.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Al Hasan, M & Zaki, M 2009, MUSK: Uniform sampling of k maximal patterns. in Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. vol. 2, pp. 646-657, 9th SIAM International Conference on Data Mining 2009, SDM 2009, Sparks, NV, United States, 30/4/09.
Al Hasan M, Zaki M. MUSK: Uniform sampling of k maximal patterns. In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. Vol. 2. 2009. p. 646-657
Al Hasan, Mohammad ; Zaki, Mohammed. / MUSK : Uniform sampling of k maximal patterns. Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics. Vol. 2 2009. pp. 646-657
@inproceedings{8948f267c91b44439cad901f095ee675,
title = "MUSK: Uniform sampling of k maximal patterns",
abstract = "Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, MUSK, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. MUSK simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.",
author = "{Al Hasan}, Mohammad and Mohammed Zaki",
year = "2009",
month = "12",
day = "31",
language = "English",
isbn = "9781615671090",
volume = "2",
pages = "646--657",
booktitle = "Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics",

}

TY - GEN

T1 - MUSK

T2 - Uniform sampling of k maximal patterns

AU - Al Hasan, Mohammad

AU - Zaki, Mohammed

PY - 2009/12/31

Y1 - 2009/12/31

N2 - Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, MUSK, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. MUSK simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.

AB - Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, MUSK, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. MUSK simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.

UR - http://www.scopus.com/inward/record.url?scp=72749121831&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72749121831&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781615671090

VL - 2

SP - 646

EP - 657

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

ER -