Efficiently mining frequent trees in a forest

Algorithms and applications

Mohammed J. Zaki

Research output: Contribution to journalArticle

212 Citations (Scopus)

Abstract

Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER), and we also compare it with TREEMINERD, which counts only distinct occurrences of a pattern. We conduct detailed experiments to test the performance and scalability of these methods. We also use tree mining to analyze RNA structure and phylogenetics data sets from bioinformatics domain.

Original languageEnglish
Pages (from-to)1021-1035
Number of pages15
JournalIEEE Transactions on Knowledge and Data Engineering
Volume17
Issue number8
DOIs
Publication statusPublished - 1 Aug 2005
Externally publishedYes

Fingerprint

Bioinformatics
Pattern matching
RNA
Data mining
Data structures
Scalability
Experiments

Keywords

  • Data mining
  • Frequent tree mining
  • Labeled trees
  • Ordered
  • Pattern matching
  • Phylogenetic trees
  • RNA structure
  • Rooted
  • Subtree enumeration

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Efficiently mining frequent trees in a forest : Algorithms and applications. / Zaki, Mohammed J.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 8, 01.08.2005, p. 1021-1035.

Research output: Contribution to journalArticle

@article{7a017009a4874250bc7c0ce440b6c304,
title = "Efficiently mining frequent trees in a forest: Algorithms and applications",
abstract = "Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER), and we also compare it with TREEMINERD, which counts only distinct occurrences of a pattern. We conduct detailed experiments to test the performance and scalability of these methods. We also use tree mining to analyze RNA structure and phylogenetics data sets from bioinformatics domain.",
keywords = "Data mining, Frequent tree mining, Labeled trees, Ordered, Pattern matching, Phylogenetic trees, RNA structure, Rooted, Subtree enumeration",
author = "Zaki, {Mohammed J.}",
year = "2005",
month = "8",
day = "1",
doi = "10.1109/TKDE.2005.125",
language = "English",
volume = "17",
pages = "1021--1035",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "8",

}

TY - JOUR

T1 - Efficiently mining frequent trees in a forest

T2 - Algorithms and applications

AU - Zaki, Mohammed J.

PY - 2005/8/1

Y1 - 2005/8/1

N2 - Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER), and we also compare it with TREEMINERD, which counts only distinct occurrences of a pattern. We conduct detailed experiments to test the performance and scalability of these methods. We also use tree mining to analyze RNA structure and phylogenetics data sets from bioinformatics domain.

AB - Mining frequent trees is very useful in domains like bioinformatics, Web mining, mining semistructured data, etc. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER), and we also compare it with TREEMINERD, which counts only distinct occurrences of a pattern. We conduct detailed experiments to test the performance and scalability of these methods. We also use tree mining to analyze RNA structure and phylogenetics data sets from bioinformatics domain.

KW - Data mining

KW - Frequent tree mining

KW - Labeled trees

KW - Ordered

KW - Pattern matching

KW - Phylogenetic trees

KW - RNA structure

KW - Rooted

KW - Subtree enumeration

UR - http://www.scopus.com/inward/record.url?scp=24344486868&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=24344486868&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2005.125

DO - 10.1109/TKDE.2005.125

M3 - Article

VL - 17

SP - 1021

EP - 1035

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 8

ER -