### Abstract

Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, MUSK, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. MUSK simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.

Original language | English |
---|---|

Title of host publication | Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics |

Pages | 646-657 |

Number of pages | 12 |

Volume | 2 |

Publication status | Published - 31 Dec 2009 |

Externally published | Yes |

Event | 9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States Duration: 30 Apr 2009 → 2 May 2009 |

### Other

Other | 9th SIAM International Conference on Data Mining 2009, SDM 2009 |
---|---|

Country | United States |

City | Sparks, NV |

Period | 30/4/09 → 2/5/09 |

### Fingerprint

### ASJC Scopus subject areas

- Computational Theory and Mathematics
- Software
- Applied Mathematics

### Cite this

*Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics*(Vol. 2, pp. 646-657)

**MUSK : Uniform sampling of k maximal patterns.** / Al Hasan, Mohammad; Zaki, Mohammed.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics.*vol. 2, pp. 646-657, 9th SIAM International Conference on Data Mining 2009, SDM 2009, Sparks, NV, United States, 30/4/09.

}

TY - GEN

T1 - MUSK

T2 - Uniform sampling of k maximal patterns

AU - Al Hasan, Mohammad

AU - Zaki, Mohammed

PY - 2009/12/31

Y1 - 2009/12/31

N2 - Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, MUSK, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. MUSK simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.

AB - Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, MUSK, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. MUSK simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.

UR - http://www.scopus.com/inward/record.url?scp=72749121831&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72749121831&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781615671090

VL - 2

SP - 646

EP - 657

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

ER -