An integrated, generic approach to pattern mining: Data mining template library

Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, Mohammed J. Zaki

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software ( http://dmtl.sourceforge. net/ ), and has been downloaded by numerous researchers from all over the world.

Original languageEnglish
Pages (from-to)457-495
Number of pages39
JournalData Mining and Knowledge Discovery
Volume17
Issue number3
DOIs
Publication statusPublished - 1 Dec 2008
Externally publishedYes

Fingerprint

Data mining
Scalability
Managers
Specifications
Experiments

Keywords

  • Frequent pattern mining
  • Generic programming
  • Graph mining
  • Itemset mining
  • Sequence mining
  • Tree mining

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

An integrated, generic approach to pattern mining : Data mining template library. / Chaoji, Vineet; Al Hasan, Mohammad; Salem, Saeed; Zaki, Mohammed J.

In: Data Mining and Knowledge Discovery, Vol. 17, No. 3, 01.12.2008, p. 457-495.

Research output: Contribution to journalArticle

Chaoji, Vineet ; Al Hasan, Mohammad ; Salem, Saeed ; Zaki, Mohammed J. / An integrated, generic approach to pattern mining : Data mining template library. In: Data Mining and Knowledge Discovery. 2008 ; Vol. 17, No. 3. pp. 457-495.
@article{d0763d2827654f228dd34c99f232488c,
title = "An integrated, generic approach to pattern mining: Data mining template library",
abstract = "Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software ( http://dmtl.sourceforge. net/ ), and has been downloaded by numerous researchers from all over the world.",
keywords = "Frequent pattern mining, Generic programming, Graph mining, Itemset mining, Sequence mining, Tree mining",
author = "Vineet Chaoji and {Al Hasan}, Mohammad and Saeed Salem and Zaki, {Mohammed J.}",
year = "2008",
month = "12",
day = "1",
doi = "10.1007/s10618-008-0098-x",
language = "English",
volume = "17",
pages = "457--495",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",
number = "3",

}

TY - JOUR

T1 - An integrated, generic approach to pattern mining

T2 - Data mining template library

AU - Chaoji, Vineet

AU - Al Hasan, Mohammad

AU - Salem, Saeed

AU - Zaki, Mohammed J.

PY - 2008/12/1

Y1 - 2008/12/1

N2 - Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software ( http://dmtl.sourceforge. net/ ), and has been downloaded by numerous researchers from all over the world.

AB - Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software ( http://dmtl.sourceforge. net/ ), and has been downloaded by numerous researchers from all over the world.

KW - Frequent pattern mining

KW - Generic programming

KW - Graph mining

KW - Itemset mining

KW - Sequence mining

KW - Tree mining

UR - http://www.scopus.com/inward/record.url?scp=54249105070&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=54249105070&partnerID=8YFLogxK

U2 - 10.1007/s10618-008-0098-x

DO - 10.1007/s10618-008-0098-x

M3 - Article

AN - SCOPUS:54249105070

VL - 17

SP - 457

EP - 495

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 3

ER -